Warning Messages in Jupyter Notebook Example Code

You’re working with the downloadable source code from a book like  Algorithms for Dummies, 2nd EditionBeginning Programming with Python For Dummies, 3rd EditionMachine Learning for Dummies, 2nd EditionPython for Data Science for Dummies, or Machine Learning Security Principles and see a warning message like this:

C:\Users\John\anaconda3\lib\site-packages\sklearn\feature_selection\_sequential.py:206: FutureWarning: Leaving `n_features_to_select` to None is deprecated in 1.0 and will become 'auto' in 1.3. To keep the same behaviour as with None (i.e. select half of the features) and avoid this warning, you should manually set `n_features_to_select='auto'` and set tol=None when creating an instance.
  warnings.warn(

Well, that’s pretty confusing looking and if you’re just learning to work with Python may give you the idea that you’ve done something seriously wrong. There are a couple things to note here. First, this is a warning message. In fact, it’s a FutureWarning message, which means the change mentioned in the warning hasn’t actually taken effect yet.

Second, if you’re using the version of Jupyter Notebook and Python mentioned in the book, it’s unlikely that the effects described in the message will become a problem anytime soon, so you can usually ignore them. (This is one reason that I always ask which version of Jupyter Notebook and Python you’re using because a newer version can definitely cause error messages to appear.) Of course, if this warning ever does turn into an error, Luca and I definitely want to hear about it at [email protected].

Third, the message does state a potential fix for the problem. If the fix is simple enough, you can always try to make the required change to see if it works. However, this is a do it at your own risk sort of modification. The point is that the warning isn’t keeping you from using the downloadable source today, so ignoring it is probably the best action to take.

If you really don’t want to see these warnings, you can always add two lines of code the to first cell of the downloadable source. The warning isn’t actually going away, you just won’t see it:

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

So, what causes these warning messages in the first place? Is the book’s source code faulty? There is nothing wrong with the book’s source code. What you’re seeing is the result of a library upgrade. Python uses a huge number of libraries and a change in any one of them can create a warning message of the sort you’ve seen. Luca and I work hard to ensure that the source code you get with the book is functional (and warning free) on all of the supported platforms at the time of writing, but it would be impossible for us to constantly update the book’s code to keep up with these library changes.

IDE Screenshot Usage in Books

There are cases where it’s very tough to figure out the correct presentation of material in a book, which is made more difficult by some readers preferring one presentation and other readers another. It comes down to how people learn in many cases. Visual learners prefer screenshots, abstract learners prefer text. Of course, there are all sorts of learners between these two extremes. So, what seems like a simple question can become quite complex.

The question at hand is whether to present screenshots of an IDE in a book with the associated example code and its output. The problem is that vendors now assume that developers have very large displays and so have made use of all of that extra screen real estate. In addition, book publishers don’t want books where a single image consumes an entire page. The result is that it’s very hard to get a screenshot where the text is completely readable. It can be done, but the text will generally still be smaller than the print in the book. Older readers complain that they need a magnifying glass to see the text at all.

However, there are benefits to using screenshots. The most important benefit is that, even if the text isn’t completely readable, visual learners can see what their IDE should look like as they follow the progress of procedures in the book. This feedback lets the visual learner know that they are doing things correctly and are getting the correct result. Another benefit is that an example tends to stay in one piece. The graphical output of an example doesn’t end up several pages away from the source code that produces it. Sometimes, textual output is wider than the page will allow using the normal font size. So, the options are to print the output in the book at the normal font size, but in a truncated form, which means that it’s no longer complete. A screenshot can show the complete textual output, but at a smaller font. For beginner readers, the second form, while not optimal, is preferred because truncating the output produces questions in the reader’s mind.

So, how do you feel about IDE screenshots in books? Are they more helpful or more confusing? Part of the reason for posts like this is to get your opinion and discover more about you as a reader. Obviously, a book author wants to use the communication techniques that work best overall for everyone, book space often not allowing for the investigation of every presentation alternative. Let me know your thoughts at [email protected].

Jupyter Notebook vs JupyterLite

There seems to be some confusion for readers of   Algorithms for Dummies, 2nd EditionBeginning Programming with Python For Dummies, 3rd Edition, Machine Learning for Dummies, 2nd EditionPython for Data Science for Dummies, and Machine Learning Security Principles lately due to the similarity of names of two Integrated Development Environments (IDEs) available now. Even though I’m sure that JupyterLite is a very good product, even the website states, “Not all the usual features available in JupyterLab and the Classic Notebook will work with JupyterLite, but many already do!” This lack of support becomes a problem when you try to run the downloadable source using JupyterLite. In addition, Luca and I haven’t tested the downloadable source with this product, so we can’t even tell you what will and won’t work.

The two supported IDEs for our books are Google Colab (recommended for those of you who want to use a mobile device) and Jupyter Notebook (recommended for those of you who have a desktop system). It’s actually preferred that you get Jupyter Notebook as part of the Anaconda toolset because Anaconda makes it very easy for you to perform some advanced setup tasks found in some of our books. For example, you gain access to the Anaconda prompt and the associated Conda utility that definitely makes it easier for you to manage some of the machine learning packages found in our books. Using either Google Colab or Jupyter Notebook makes it very much easier for Luca and I to help you with your book-specific questions.

Please let me know if you have any questions or concerns about how to setup your programming environment for our books at [email protected]. Remember to use the version of the products listed in the book for optimal results in working with the downloadable source. In addition, always remember to use the downloadable source to enhance your learning experience.

Programming Languages Commonly Used for Data Science

The world is packed with programming languages, each of them proclaiming their particular forte and telling you why you need to learn them. A good developer does learn multiple languages, each of which becomes a tool for a certain kind of development, but even the most enthusiastic developer won’t learn every programming language out there. It’s important to make good choices.

Data Science is a particular kind of development task that works well with certain kinds of programming languages. Choosing the correct tool makes your life easier. It’s akin to using a hammer to drive a screw rather than a screwdriver. Yes, the hammer works, but the screwdriver is much easier to use and definitely does a better job. Data scientists usually use only a few languages because they make working with data easier. With this in mind, here are the top languages for data science work in order of preference:

  • Python (general purpose): Many data scientists prefer to use Python because it provides a wealth of libraries, such as NumPy, SciPy, MatPlotLib, pandas, and Scikit-learn, to make data science tasks significantly easier. Python is also a precise language that makes it easy to use multi-processing on large datasets — reducing the time required to analyze them. The data science community has also stepped up with specialized IDEs, such as Anaconda, that implement the Jupyter Notebook concept, which makes working with data science calculations significantly easier. Besides all of these things in Python’s favor, it’s also an excellent language for creating glue code with languages such as C/C++ and Fortran. The Python documentation actually shows how to create the required extensions. Most Python users rely on the language to see patterns, such as allowing a robot to see a group of pixels as an object. It also sees use for all sorts of scientific tasks.
  • R (special purpose statistical): In many respects, Python and R share the same sorts of functionality but implement it in different ways. Depending on which source you view, Python and R have about the same number of proponents, and some people use Python and R interchangeably (or sometimes in tandem). Unlike Python, R provides its own environment, so you don’t need a third-party product such as Anaconda. However, R doesn’t appear to mix with other languages with the ease that Python provides.
  • SQL (database management): The most important thing to remember about Structured Query Language (SQL) is that it focuses on data rather than tasks. Businesses can’t operate without good data management — the data is the business. Large organizations use some sort of relational database, which is normally accessible with SQL, to store their data. Most Database Management System (DBMS) products rely on SQL as their main language, and DBMS usually has a large number of data analysis and other data science features built in. Because you’re accessing the data natively, there is often a significant speed gain in performing data science tasks this way. Database Administrators (DBAs) generally use SQL to manage or manipulate the data rather than necessarily perform detailed analysis of it. However, the data scientist can also use SQL for various data science tasks and make the resulting scripts available to the DBAs for their needs.
  • Java (general purpose): Some data scientists perform other kinds of programming that require a general purpose, widely adapted and popular, language. In addition to providing access to a large number of libraries (most of which aren’t actually all that useful for data science, but do work for other needs), Java supports object orientation better than any of the other languages in this list. In addition, it’s strongly typed and tends to run quite quickly. Consequently, some people prefer it for finalized code. Java isn’t a good choice for experimentation or ad hoc queries.
  • Scala (general purpose): Because Scala uses the Java Virtual Machine (JVM) it does have some of the advantages and disadvantages of Java. However, like Python, Scala provides strong support for the functional programming paradigm, which uses lambda calculus as its basis. In addition, Apache Spark is written in Scala, which means that you have good support for cluster computing when using this language — think huge dataset support. Some of the pitfalls of using Scala are that it’s hard to set up correctly, it has a steep learning curve, and it lacks a comprehensive set of data science specific libraries.

There are likely other languages that data scientists use, but this list gives you a good idea of what to look for in any programming language you choose for data science tasks. What it comes down to is choosing languages that help you perform analysis, work with huge datasets, and allow you to perform some level of general programming tasks. Let me know your thoughts about data science programming languages at [email protected].

Compiling Python

None of my Python books, including Algorithms for Dummies, 2nd Edition, Beginning Programming with Python For Dummies, 3rd EditionMachine Learning for Dummies, 2nd Edition,  Machine Learning Security Principles, and Python for Data Science for Dummies, show how to compile a Python program. This is because the interpreted nature of Python makes it easier to work with scripts for these reasons:

  • The interpreter provides instant results to make learning faster.
  • It’s easier and faster to fix errors.
  • The use of notebooks, as is found in all of the books, makes creating output easier.
  • The use of literate programming techniques helps create an environment where acquired knowledge is more likely to remain acquired.
  • Using literate programming techniques also makes it possible to document the code in a manner that’s more like reading a textbook than looking at source code.
  • The use of scripts promotes experimentation, which leads to new ideas and techniques.

These are all great reasons to use scripts in books. In fact, I’m sure that many people will have other reasons to use scripts. The one thing you should note is that Python does automatically compile some files to do things like reduce loading time. Anytime you see a .pyc file, the file has been compiled by Python to bytecode through various means, including importing the script. It’s also possible to pre-compile a script using the python interpreter’s -m command line switch. The resulting output appears in the __pycache__ folder with a .pyc extension. You can further modify the compilation process by using the -o and -oo command line switches, which offer various optimizations to make the code load even faster. The problems with these outputs is that they’re only mildly obfuscated, so if your intent is to hide your code from prying eyes, this isn’t the best option.

Another built-in compilation option is to use the compile() function, which performs a compilation directly in your code. The purpose of using this function is to speed up code that is used often within your application. For example, you might use it to compile code that appears within a loop. Obviously, you get no obfuscation advantage using this approach, but you do get a speed advantage. If you don’t want to go through the bother of using the compile() function, you could always use a third party product like Numba, which reduces the task to one of adding a decorator to your code.

None of the solutions discussed so far do anything more than turn your Python script into bytecode, which is still interpreted (albeit, much faster than using a human language script). There is also an option for turning your Python code into actual machine code through various intermediate steps. A Python compiler usually turns your Python script into an intermediate language, which is then compiled into actual machine code that is native to the host platform. However, it may simply run your script online, so you need to know in advance whether you’ll end up with an executable file in the end. An executable file can offer these advantages:

  • The source code is fully obfuscated, protecting your development investment.
  • The code runs significantly faster than any other means of interacting with it.
  • Instead of a host of script files, you usually end up with just a few executable files, perhaps even just one.
  • Because it’s harder to modify, an executable file can be more secure and reliable than using scripts.

If your goal is to exclusively create an executable output, then a product like auto-py-to-exe might be your best option. This way you get to use your interpreter of choice to develop the application, then use another product to turn the result into an .exe file. The idea is to get the best of both worlds. The point of all this is that you don’t strictly have to interact with Python code in one way, using an interpreter. You have a great many options at your disposal. Let me know your thoughts about working with compiled Python code at [email protected].

Comment and Document Updates for CI/CD

In reading about Continuous Integration/Continuous Deployment (CI/CD) I often find ways to manage the code, to get people around the code, to keep errors out of the code, and so on. It’s all about the code. Developers have, in fact, developed myriad ways to keep code size small, updated, deployed, tested, and so on to ensure that users have what they want, when the they want it (if not before). Sometimes my head spins on its axis after reading such documents because it becomes a high speed dizzying affair. It’s somehow assumed that everyone can just keep up. Except, there are new people and older people and people with lesser attention spans who can’t keep up, which is why comments and documentation are so important.

As part of the coding process, developers also need to update both comments and documentation or someone will come along and make modifications based on outdated information. Even though making such updates seems like a waste of time since everyone should be able to keep up, the truth is that these updates ultimately save time. However, the updates, when they occur (which apparently isn’t often) are often made in a haphazard manner reminiscent of an old Keystone Cops movie.

Adding a process, a workflow, to the CI/CD mill is important to ensure that everything remains in sync: code, comments, and documentation. A best practice way to accomplish this task is to add steps to every update process so that nothing is left behind. Here’s how you could approach the problem:

  1. Perform the required code updates.
  2. During testing, ensure that the comments within the code actually match what the code is doing. Testing and other review processes should not only look at the code, but the comments too.
  3. Update the documentation as final testing occurs. Make sure to include these elements:
    • Text
    • Drawings
    • Mockups
    • Visual Aids
    • Videos
    • Any other documentation elements
  4. Specify that any old comments/documentation are outdated using one of these approaches:
    • Mark it as deprecated
    • Remove it from the work area and put it in an archive
    • Delete it completely
  5. Deploy the application update. If you don’t deploy the update after these steps are done, they won’t get done. Everyone will wander off somewhere and forget all about any sort of comment or documentation update.

Obviously, the approach you end up using has to meet the requirements of your organization. It also has to be simple enough that people will actually, albeit begrudgingly, perform the work. What methods do you use to keep everything in sync at your organization? Let me know at [email protected].

Choosing a First Language to Learn

My first programming experience (during the time of the dinosaurs) involved using a light panel to enter machine code into a rudimentary computer with 3 KB (yes, that’s KB) of RAM. The output was also in light form and I needed to decode the lights to determine if my code worked right. I worked with various systems in various ways over the next several years. By the time I got to college, the first language I learned there was BASIC (Beginner’s All-purpose Symbolic Instruction Code), then PC assembler, followed by Pascal. In fact, I’ve just stopped counting the number of languages I’ve learned over the years because each language has a place in my programmer’s toolbox. Of course, the question is what language you should learn first. I get asked that question quite often because there are a huge number of languages available today and no one wants to invest time in a language that’s going nowhere.

Part of the answer to the question of what to learn first is what you intend to do with the language. Each language has features that make it better at performing specific tasks. Programming languages can be split into those that are designed for a special purpose and those that are designed for a general purpose. A special purpose language, such as Structured Query Language (SQL), could be a good choice if you intend to move into database work immediately. However, for most people, a general purpose language works better because you can use it for a wider variety of tasks without bending yourself into a pretzel shape to do it.

A good place to start if you want to choose a language that’s popular enough to help you get a job afterward is the TIOBE index. It shows a listing of which languages are most popular today. As I’m writing this, Python is the most popular language on the list, but that could change tomorrow. Generally, any of the top ten languages on the list are good choices.

Of course, you want a programming language that is easy to learn. C/C++, C#, and Java are all complex languages with great flexibility. Furthermore, C/C++ and C# can help you work at a low level with the computer hardware. These languages have a steep learning curve and may not provide the best choices for a starting point. That said, if you have a line on a job that uses any of these languages, you could do worse than start here, just be prepared to burn the midnight oil learning.

The language I suggest people learn as a starting point is Python. In fact, Beginning Programming with Python For Dummies, 3rd Edition makes a point of showing you just how easy things can be. You don’t even need to invest in any special software, the book shows you how to use Google Colab so that you could conceivably learn how to program on your smartphone or TV. Others must agree with me because Python has turned into the language that the education industry turns to most often for budding programmers.

There are a lot of programming languages available today. You need to research the choice by taking into account what your personal needs are and what sort of job you want to get afterwards. You might find that something like JavaScript or Ruby will provide benefits that you can’t get with Python. Which language do you think will work best for you? Let me know your reasons at [email protected].

Creating Sensible Error Trapping

This is an update of a post that originally appeared on May 23, 2011.

Errors in software happen. A file is missing on the hard drive or the user presses an unexpected key combination. There are errors of all shapes and sizes; expected and unexpected. The sources of errors are almost limitless. Some developers look at this vastness, become overwhelmed, and handle all errors the same way—by generating an ambiguous exception for absolutely every error that doesn’t help anyone solve anything. This is the worst case scenario that’s all too common in software today. I’ve talked with any number of people who have had to employ extreme effort just to figure the source of the exception out; many people simply give up and hope that someone has already discovered the source of the error.

At one time, error handling functionality in application development languages was so poor that it was possible to give the developer the benefit of a doubt. However, with the development tools that developers have at their disposal today, there is never a reason to provide an ambiguous “one size fits all” exception. For one thing, developers should make a distinction between the expected and the unexpected. Any expected error—a missing file for example—should be handled directly and specifically. If the application absolutely must have the file and can’t recreate it, then it should display a message saying which file is missing, where it is missing from, and possibly how to obtain another copy.

Even more than simply shoving the burden onto the user, however, modern applications have significantly more resources available for handling the error automatically. For example, it’s quite possible to use an Internet connection to access the vendor’s Web site and automatically download a missing application file. Except to tell the user what’s happening when the repair will take a few minutes, the application shouldn’t even bother the user with this particular kind of error—the repair should be automatic.

All of my essential programming books include at least mentions of error handling, debugging, exceptions, and other tasks associated with running code efficiently and smoothly. For example, Part IV of C++ All-In-One for Dummies, 4th Edition is devoted to the topic of debugging. Part V Chapter 3 of this same book talks about exceptions. If you’re a C# developer, C# 10.0 All-in-One for Dummies discusses exception handling in Book I Chapter 9. Book IV Chapter 2 discusses how to use the debugger to find errors. The point is that it’s essential to handle errors in your applications in a manner that makes sense to the users who rely on the application daily and the developers who maintain it.

Note that many of my newer books provide instructions for working with online IDEs, most especially Google Colab. These online IDEs rarely provide built-in debugging functionality, so then you need to resort to other means, such as those expressed in Debugging in Google Colab notebook.

Exceptional conditions do occur. However, even in these situations the developer must avoid the generic exception at all costs. If an application experiences an unexpected error and there isn’t any way to recover from it automatically, the user requires as much information as possible about the error in order to fix it. This means that the application should diagnose the problem as much as possible. Don’t tell the user that the application simply has to end—there is never a good reason to include this sort of message. Instead, tell the user that the application can’t locate a required resource and specify the resource in as much detail as possible. If possible, let the user fix the resource access problem and then retry access before you simply let the application die an ignoble death. Remember this! Any exception that your application displays means that you’ve failed as a developer to locate and repair the errors, so exceptions should be reserved for truly exceptional conditions.

Not everyone agrees with my approach to error trapping, but I have yet to hear a convincing argument to provide unreliable, non-specific error trapping in an application. Poor error trapping always translates into increased user dissatisfaction, increased support costs, and a reduction in profitability. Let me know your thoughts on the issue of creating a sensible error trapping strategy at [email protected].

Creating Useful Comments

This is an update of a post that originally appeared on November 21, 2011.

A major problem with most applications today is that they lack useful comments. It’s impossible for anyone to truly understand how an application works unless the developer provides comments at the time the code is written. In fact, this issue extends to the developer. A month after someone writes an application, it’s possible to forget the important details about it. In fact, for some of us, the interval between writing and forgetting is even shorter. Despite my best efforts and those of many other authors, many online examples lack any comments whatsoever, making them nearly useless to anyone who lacks time to run the application through a debugger to discover how it works.

Good application code comments help developers of all stripes in a number of ways. As a minimum, the comments you provide as part of your application code provides these benefits.

  • Debugging: It’s easier to debug an application that has good comments because the comments help the person performing the debugging understand how the developer envisioned the application working.
  • Updating: Anyone who has tried to update an application that lacks comments knows the pain of trying to figure out the best way to do it. Often, an update introduces new bugs because the person performing the update doesn’t understand how to interact with the original code.
  • Documentation: Modern IDEs often provide a means of automatically generating application documentation based on the developer comments. Good comments significantly reduce the work required to create documentation and sometimes eliminate it altogether.
  • Technique Description: You get a brainstorm in the middle of the night and try it in your code the next day. It works! Comments help you preserve the brainstorm that you won’t get back later no matter how hard you try. The technique you use today could also solve problems in future applications, but the technique may become unavailable unless you document it.
  • Problem Resolution: Code often takes a circuitous route to accomplish a task because the direct path will result in failure. Unless you document your reasons for using a less direct route, an update could cause problems by removing the safeguards you’ve provided.
  • Performance Tuning: Good comments help anyone tuning the application understand where performance changes could end up causing the application to run more slowly or not at all. A lot of performance improvements end up hurting the user, the data, or the application because the person tuning the application didn’t have proper comments for making the adjustments.

The need for good comments means creating a comment that has the substance required for someone to understand and use it. Unfortunately, it’s sometimes hard to determine what a good comment contains in the moment because you already know what the code does and how it does it. Consequently, having a guide as to what to write is helpful. When writing a comment, ask yourself these questions:

  • Who is affected by the code?
  • What is the code supposed to do?
  • When is the code supposed to perform this task?
  • Where does the code obtain resources needed to perform the task?
  • Why did the developer use a particular technique to write the code?
  • How does the code accomplish the task without causing problems with other applications or system resources?

There are many other questions you could ask yourself, but these six questions are a good start. You won’t answer every question for every last piece of code in the application because sometimes a question isn’t pertinent. As you work through your code and gain experience, start writing down questions you find yourself asking. Good answers to aggravating questions produce superior comments. Whenever you pull your hair out trying to figure out someone’s code, especially your own, remember that a comment could have saved you time, frustration, and effort. What is your take on comments? Let me know at [email protected].

Considering Perception in User Interface Design

This is an update of a post that originally appeared on January 24, 2014.

The original version of this article had humans seeing images in as little as 13 ms. Nothing much has really changed since then. I read a few articles recently that reminded me of a user interface design discussion I once had with a friend of mine. First, let’s discuss the articles:

  • The first, Everything we see is a mash-up of the brain’s last 15 seconds of visual information, says that humans can actually see something in as little as 15 ms. That short time frame provides the information the brain needs to target a point of visual focus.
  • The older second article, ‘Sixth Sense’ Can Be Explained by Science, explains how the sixth sense that many people relate as being supernatural in origin is actually explainable through scientific means. The brain detects a change-probably as a result of that 15 ms view-and informs the rest of the mind about it. However, the change hasn’t been targeted for closer inspection, so the viewer can’t articulate the change.
  • The third article, The silent “sixth” sense, is a more scientific and slightly modernized view of the second article. In short, you know the change is there, but you can’t say what has actually changed.

So, you might wonder what this has to do with website design. It turns out that you can use these facts to help focus user attention on specific locations on your site. Now, I’m not talking here about the use of subliminal perception, which is clearly illegal in many locations. Rather, it’s possible to do as a friend suggested in designing a site and change a small, but noticeable, element each time a page is reloaded. Of course, you need not reload the entire page. Technologies such as Asynchronous JavaScript And XML (AJAX) make it possible to reload just a single element as needed. (Of course, changing a single element in a desktop application is incredibly easy because nothing special is needed to do it.) The point of making this change is to cause the viewer to look harder at the element you most want them to focus on. It’s just another method for ensuring that the right area of a page or other user interface element gets viewed.

However, the articles also make for interesting thoughts about the whole issue of user interface design. Presentation is an important part of design. Your application must use good design principles to attract attention. However, these articles also present the idea of time as a factor in designing the user interface. For example, the order in which application elements load is important because the brain can perceive the difference. You might not consciously register that element A loaded some number of milliseconds sooner than element B, but subconsciously, element A attracts more attention because it registered first and your brain targeted it first. This essentially explains the difference between UX and UI, since the two are eternally intermingled in the world of development.

As science continues to probe the depths of perception, it helps developers come up with more effective ways in which to present information in a way that enhances the user experience and the benefit of any given application to the user. However, in order to make any user interface change effective, you must apply it consistently across the entire application and ensure that the technique isn’t used to an extreme. Choosing just one element per display (whether a page, window, or dialog box) to change is important. Otherwise, the effectiveness of the technique is diluted and the user might not notice it at all.

What is your take on the use of perception as a means of controlling the user interface? Do you feel that subtle techniques like the ones described in this post are helpful? Let me know your thoughts at [email protected].