When you get the file, open the archive on your hard drive and then follow the directions in the book to create the source code repository for each language. The repository instructions appear on Page 60 for the R programming language and on Page 99 for Python. I apologize for any problems that the initial lack of source code may have caused. If you experience any problems whatsoever in using the source code, please feel free to contact me at John@JohnMuellerBooks.com. Luca and I want to be certain that you have a great learning experience, which means being able to download and use the book’s source code because using hand typed code often leads to problems.
Some people may have misinterpreted the content at the beginning of Chapter 3 in Python for Data Science for Dummies. It isn’t necessary to install the products listed in the Considering the Off-the-Shelf Cross-Platform Scientific Distributions section starting on Page 39. These products are for those of you who would like to try a development environment other than the one used in the book, which is Anaconda 2.1.0. However, unless you’re an advanced user, it’s far better to install Anaconda 2.1.0 so that you can follow the exercises in the book without problem. Installing all of the products listed in Chapter 3 will result in a setup that won’t work at all because the various products will conflict with each other.
Because Continuum has upgraded Anaconda, you need to download the 2.1.0 version from the archive at https://repo.continuum.io/archive/.There are separate downloads for Windows, Mac OS X, and Linux. The chapter tells you precisely which file to download. For example, for Windows you’d download Anaconda-2.1.0-Windows-x86_64.exe. The point is to use the same version of Anaconda as you find in the book. You can find the installation instructions on Page 41 if you have a Windows system, Page 45 if you have a Linux system, or Page 46 if you have a Mac OS X system. Make sure you download the databases for the book by using the procedures that start on page 47.
Following this process is the best way to ensure you get a good installation for Python for Data Science for Dummies. Luca and I want to make certain that you can use the book to discover the wonders of data science without having to jump through a lot of hoops to do it. Please feel free to contact me at John@JohnMuellerBooks.com if you have any questions about the installation process.
Both Python for Data Science for Dummies and Machine Learning for Dummies rely on a version of Anaconda that uses IPython as part of its offering.Theoretically, you could also use Anaconda with Beginning Programming with Python For Dummies, but that book is designed to provide you with an experience that relies on the strict Python offerings (without the use of external tools). In other words, the procedures in this third book are designed for use with IDLE, the IDE that comes with Python. IPython extends the development environment in a number of ways, one of which is the use of magic functions. You see the magic functions in the code of the first two books as calls that begin with either one or two percent signs (% or %%). The most common of these magic functions is %matplotlib, which controls how IPython Notebook or Jupyter Notebook display plot output from the code.
Of course, you might choose to use another IDE—one that isn’t quite so magical as Anaconda provides through IPython. In this case, you need to remove those magic commands. Removing the commands won’t affect functionality of the code. The example will still work as explained in the book. However, the way that the IDE presents output could change. For example, instead of being inline, plots could appear in a separate window. Even though using a separate window is less convenient, either method works just fine. If you ever do encounter a magic function-related problem, please be sure to let me know at John@JohnMuellerBooks.com.
You need to know a few things about working with conda. The first is that you need to open an Anaconda prompt to use it. For example, when working with Windows, you use the Start ⇒ All Programs ⇒ Anaconda<Version> ⇒ Anaconda Prompt command to open a window like the one shown here where you can enter commands. (Your Anaconda Prompt may look different than the one shown based on the platform you use and the version of Anaconda you have installed.)
You can easily discover the features the conda command supports by typing conda -h and pressing Enter. You see a list of command line switches similar to the ones shown here:
As you can see, there are quite a few tasks you can perform. To determine whether you have a package installed, use the Conda search <package name> command. For example, if you want to determine if you have Pandas installed, you type Conda search Pandas and press Enter. You see a list of Pandas versions installed, assuming that Pandas is installed, like this:
The information you get from conda is far more in depth than pip provides. To determine what you have installed, just go down the list and determine whether you have the version of Pandas that you need. If you don’t, then type Conda update pandas and press Enter (notice the case used). On the other hand, let’s say you want to install BeautifulSoup. Well, the first time through, try typing Conda install BeautifulSoup and pressing Enter. You see an error message that tells you what to type like this:
Since you want to install the latest BeautifulSoup, type Conda install beautiful-soup and press Enter. After searching for the required update information, conda will ask if you want to proceed. Type y and press Enter. You’ll see a whole bunch of activity take place, but eventually, you have a new version of BeautifulSoup, plus all the supporting functionality, installed correctly in the correct locations. Here’s how things looked on my system:
At this point, you have BeautifulSoup installed. Installing other packages follows the same path. Using conda does require a little more expertise than using pip, but you also gain additional flexibility and garner more information. When everything goes well, either tool does an equally good job of getting the installation or update task done, but conda excels in helping you past troublesome installations. Let me know your thoughts about using conda to install the packages required by my books at John@JohnMuellerBooks.com.
It’s essential to remember that Beginning Programming with Python for Dummies relies on the 3.3.4 version of Python. The other two books rely on Python 2.7.x versions. The reason for using the older version of Python in these two books is that these books rely on libraries that Python 3.x doesn’t support. If you try to install these libraries on Python 3.x, you’ll get an error message of somewhat dubious usefulness.
In most cases, the easiest way to install a package is to open a command prompt with Administrator privileges and rely on the pip (for Python 2.x) or pip3 (for Python 3.x) command to perform the installation. For example, to install BeautifulSoup, you can type pip install beautifulsoup4 and press Enter. Installing any other package follows about the same route.
The only problem with the pip utility is that you don’t get it with every version of Python. When using an older version of Python, such as 3.3.4, you actually need to install the pip utility to use it. Fortunately, the installation instructions at https://pip.pypa.io/en/latest/installing/ aren’t difficult to use and you’ll be up and running in a few minutes.
Some readers have also complained that pip doesn’t provide much information when it comes to errors. The lack of information can prove problematic when an installation doesn’t go as planned. Next week I plan to cover the conda utility that comes with Anaconda. This utility isn’t as easy to use in some respects as pip, but it does provide considerably more information. If you have any questions about using the pip utility with my books, please contact me at John@JohnMuellerBooks.com.
The downloadable source for Python for Data Science for Dummies contains a problem that doesn’t actually appear in the book. If you look at page 221, the code block in the middle of the page contains a line saying import numpy as np. This line is essential because the code won’t run without it. The downloadable source for Chapter 12 is missing this line so the example doesn’t run. This P4DS4D; 12; Stretching Pythons Capabilities link provides you with a .ZIP file that contains the replacement source code. Simple remove the P4DS4D; 12; Stretching Pythons Capabilities.ipynb file from the archive and use it in place of your existing file.
Luca and I always want you to have a great experience with our book, so keep those emails coming. Please let me know if you have any questions about source code file update at John@JohnMuellerBooks.com. I’m sorry about any errors that appear in the downloadable source and appreciate the readers who have pointed them out.
Python for Data Science for Dummies contains two errors on page 145. The first error appears in the second paragraph on that page. You can safely disregard the sentence that reads, “The use_idf controls the use of inverse-document-frequency reweighting, which is turned off in this case.” The code doesn’t contain a reference to the use_idf parameter. However, you can read about it on the Scikit-Learn site. This parameter defaults to being turned on, which is how it’s used for the example.
The second error is also in the second paragraph. The discussion references the tf_transformer.transform() method call. The actual method call is tfidf.transform(), which does appear in the sample code. The discussion about how the method works is correct, just the name of the object is wrong.
Please let me know if you have any questions about either of these changes at John@JohnMuellerBooks.com. I’m sorry about any errors that appear in the book and appreciate the readers who have pointed them out.
Python for Data Science for Dummies contains an error in the example that appears on the top half of page 124. In the first of the two grey boxes, the code computes the results of four print statements. The bottom-most print statement, print x[1:2, 1:2], is supposed to compute a result based on rows 1 and 2 of columns 1 and 2, and the bottom grey box seems to confirm that interpretation by the showing the result as [[[14 15 16] [17 18 19]] [[24 25 26] [27 28 29]]]. However, the answer provided for this example in the downloadable source code is [[[14 15 16]]], which doesn’t agree with that in the text.
The good news is that the downloadable source contains the correct code. The error appears only in the book. The last print statement in the book is wrong. Here is the correct code (with output) for this example:
A number of readers have written to report that XMLData2.xml is missing from the downloadable source for Python for Data Science for Dummies. You encounter this file in Chapter 6, on page 108. The publisher has already added the file to the downloadable source, but you might be missing the file from your copy. If so, you can download it by clicking XMLData2.zip. I’m truly sorry about any problems that the missing file might have caused. Please be sure to let me know about your book specific question at John@JohnMuellerBooks.com.
It seems as if Python developers are having more than a few problems at the moment from a number of sources. I recently wrote about the potential issues for readers of Beginning Programming with Python For Dummies and Python for Data Science for Dummies from Windows 10 (Python and Windows 10). However, some readers have come back afterward to say they’re still seeing warnings. It wasn’t until one of the beta readers for Machine Learning for Dummies also saw some of these warnings that it became apparent that some other problem is at work. A recent upgrade to NumPy 1.10.1 has created these warnings. You can see some message threads about the issue at:
The important thing to remember is that you’ll see warnings, not errors (unless there is a problem Luca, my coauthor for Python for Data Science for Dummies, and I haven’t seen yet). For now, updating all of the Anaconda components is the only way to actually get rid of the warnings, which can prove to be quite a pain. However, the warnings are just that, warnings. The code in the books will still run just fine. The best way to avoid a lot of work and potentially creating yet more problems is to ignore the warnings for now. In order to ignore the warnings, type the following two lines of code:
Obviously, the situation is inconvenient for everyone, but the various libraries will get in sync sometime soon and then the warnings will disappear until the next set of updates. Please let me know if you continue to see problems after making this fix at John@JohnMuellerBooks.com.