Tip Error in Python for Data Science for Dummies

There is a small error on page 318 of Python for Data Science for Dummies. You can find it near the middle of the page in the Tip text. The current text on the second line of that paragraph says, “k as a number near the squared number of available observations.” However, the text should really read, “k as a number near the squared root number of available observations.” The word root is missing, which obviously changes the mathematical meaning of the text. Please accept our apologies for the typo. Let me know if you find any other errors of a technical nature in the book at John@JohnMuellerBooks.com and I’ll be sure to provide a blog post about it here. Thank you for your support!

 

Python and Windows 10

A number of Beginning Programming with Python For Dummies and Python for Data Science for Dummies readers have written to tell me that the installation instructions for Python in these two books don’t appear to work well with Windows 10. Unfortunately, Windows 10 wasn’t available during the writing of either book, but the operating system does seem to present problems for a number of people—not just developers. Microsoft’s enforced upgrades are just one source of woe. Of course, Windows 10 has its supporters as well who are trying to tell you not to worry about these issues. I’m not here to tell you whether you should use Windows 10 or not—that’s a topic for another post. However, I also understand you need a fix for the installation process for these two books if you are running Windows 10.

For the most part, all you really need to do is install Python 3.x for Beginning Programming with Python for Dummies and Anaconda for Python for Data Science for Dummies. The problem doesn’t appear to be the actual installation (given there are no error messages when the installation completes), but rather accessing the applications after the installation. To ensure you can access the applications, you need to be sure they’re part of the path. You may also need to open a command prompt to start the applications, rather than rely on a Start Menu entry to access them. Given that I don’t have Windows 10 installed and don’t plan to install it for now because I need to support the documented configurations for the books, the best I can do is direct you to a site where you can discover how to perform these tasks under Windows 10. The article I suggest is: Setting up your Windows 10 System for Python Development (PyDev, Eclipse, Python). You don’t need to setup Eclipse or do anything else fancy. Once you have Python installed, you should be ready to go.

My feeling is that Windows 10 is going to create more than a few problems for developers because the forced upgrades will mean that you can’t ever rely on your setup being stable. The moment you get one set of Microsoft induced problems fixed, the operating system will automatically download a new set to your machine. For this reason, I can’t recommend using Windows 10 for development purposes. You’ll be better served with Windows 7 or Windows 8, with Windows 7 being the optimal choice. It could be that I’m wrong on this issue and I do plan to explore it further, but for the moment, I’m not offering Windows 10 support directly. I’ll do what I can to get you up and running with your Windows 10 system, but I can’t guarantee results because my books haven’t been written with the vagaries of Windows 10 in mind. Please let me know about your book-specific questions and concerns at John@JohnMuellerBooks.com.

 

Missing File from Python for Data Science for Dummies Downloadable Source

A reader recently contacted me regarding a missing file from the downloadable source for Python for Data Science for Dummies. This is the P4DS4D; 01; Quick Overview.ipynb you need for the first chapter. Simply click here to download P4DS4D; 01; Quick Overview.ipynb. I’m also asking the publisher to add the missing file to the downloadable source found on the Dummies site at http://www.dummies.com/store/product/Python-for-Data-Science-For-Dummies.productCd-1118844181,descCd-DOWNLOAD.html. If you encounter any other problems with the book, please be sure to let me know at John@JohnMuellerBooks.com. Thank you for your patience!

 

Finding and Employing Data Science Tools

Python for Data Science for Dummies introduces you to a number of common libraries used for data science experimentation and discovery. Most of these libraries also figure prominently as part of a data scientist’s toolbox because they provide common functionality needed for every application. However, these libraries are only the tip of the data science toolbox. Because data science is such a new technology, you can find all sorts of tools to perform a wide range of tasks, but there is little standardization and some of these tools are hard to categorize so that you know where they fit within your toolbox. That’s why I was excited to see, The data science ecosystem, the first of a three part series of articles that describe some of the tools available for use in data science projects. You can find the other two parts of the article at:

The problem for people who want to explore data science and machine learning today might not be the lack of tools, but the lack of creativity in using them. In order to explore data science, it’s important to understand that the tools only work when your prepare the data properly, employ the correct algorithm, and define reasonable goals. No matter how hard you try, data science and machine learning can’t provide you with the correct numeric sequences for the next five lottery wins. However, data science can help you locate potential sources of fraud in an organization. The article, Machine learning and the strategic snake oil reserve, sums up what may be the biggest problem with data science today—people expect miracles without putting in the required work. Fortunately, there are new tools on the horizon to make languages, such as Python, and products, such as Hadoop, easier for even the less creative mind to use (see Python and Hadoop project puts data scientists first).

Even with a great imagination, the tools available today may not do the job you want as well as they should because the underlying hardware isn’t capable of performing the required tasks. The process is further hampered by a misuse of the skills that data scientists provide (see You’re hiring the wrong data scientists for details). As a result, you need a large number of specialized tools in order to perform tasks that shouldn’t require them. However, that’s the reason why you need to know about the availability of these tools so that you can produce useful results on today’s hardware with a minimum of fuss. Asking the question, “How would Alan Turing fix A.I.?” helps you understand the complexities of the data science and machine learning environments.

Data science, machine learning, data scientists with even greater skills, and better hardware will keep the momentum going well into the future. As the Internet of Things (IoT) continues to move forward and the problem of what to do with all that data becomes even larger, data science will take on a larger role in everyone’s daily life. Count on reading more articles like, Google a step closer to developing machines with human-like intelligence, that describe the proliferation of new hardware and new tools to make the full potential of data science and machine learning a reality. In the meantime, getting the tools you need and exploring the ways in which you can creatively use data science to solve problems is the best way to go for now. Let me know your thoughts on the future of data science at John@JohnMuellerBooks.com.

 

Download Site for Python

I recently received an e-mail from a reader who had a bad install with Python 3.3.4 on a laptop with 64-bit Windows 7 installed. No matter what the reader did, the installation wouldn’t work. The application would fail with an error stating that pythonw.exe was unable to start and it included an error of 0xc000007b. He had downloaded the code from https://www.python.org/download/releases/3.3.4/, which is the site mentioned on page 25 of Beginning Programming with Python For Dummies. However, downloading a copy from http://continuum.io/downloads#py34 or https://store.continuum.io/cshop/anaconda/ did provide a copy of Python 3.4.3 (not the version 3.3.4 that is used in the book) that does work on his system.

The problem with this solution is that installing a copy from this second site also installs Anaconda—a product that isn’t covered in the book. In order to work with the IDLE examples in the book, you must open a copy of IDLE in the Anaconda\Scripts folder of the Anaconda installation. You’ll likely find this folder in your personal folder of your system. If you do find that you can’t get the copy of the product from the Python download site to work on your system, try this second solution and please let me know about the issue at John@JohnMuellerBooks.com. I would strongly encourage you to try the setup found in the book, however, because using Anaconda will cause extra work for you and this book is truly meant to help someone who has little or no programming experience discover the joys of working with Python.

As a side note, I have tried the book’s source code with the latest Python release, 3.4.3 (the book was originally written to use version 3.3.4). All of the source code works on my test system, but I’d love to hear if it works on your system as well. You can obtain this updated version of Python at https://www.python.org/downloads/release/python-343/ or http://continuum.io/downloads#py34 (if you don’t mind installing Anaconda as well).

When using the 3.4.3 version of Python, your screenshots may vary some from those found in the book. All version-specific information will change, so you need to take this change into account as you read. Please let me know if you experience any problems using this updated version on your system. In the meantime, happy reading!

 

Missing Python for Data Science for Dummies Companion Files

For all those long suffering readers who have been missing the companion files for Python for Data Science for Dummies, they’re finally available at http://www.dummies.com/store/product/Python-for-Data-Science-For-Dummies.productCd-1118844181,descCd-DOWNLOAD.html. All you need to do is click the Click to Download link on the page. I’m truly sorry you needed to wait so long. Thank you to everyone who noticed the missing files and also the incorrect link in the book, which now appears in the book errata. Please let me know if you have any problems locating the files or downloading them at John@JohnMuellerBooks.com.

 

Getting Your Python for Data Science for Dummies Extras

The process of discovering how to use Python to perform data science tasks begins when you get your copy of Python for Data Science for Dummies. Luca and I spent a good deal of time making your data science learning experience easier and even fun. However, it only starts there. Like many of my other books, you can also find online content for Python for Data Science for Dummies in these forms:

I always want to hear your questions about my books. Be sure to write me about them at John@JohnMuellerBooks.com. In the meantime, I hope you enjoy your Python for Data Science for Dummies reading experience. Thank you for your continued support.


20 July 2015: Updated to show correct link for the companion files.

 

Using Pass Versus Continue in Python

A number of people have asked me about the discussion of the pass and continue clauses on page 140 of Beginning Programming with Python For Dummies. The example on that page is confusing a lot of people. Most people assume that when the example prints its output, the w should not appear as part of the output—as if the pass and continue clauses work precisely the same.

If you look at the second sentence of the first paragraph on page 140, you see that it tells you that pass and continue work almost the same way, except that the pass clause allows completion of the code in the if block in which it appears. This distinction is important. The continue clause is immediate, the pass clause isn’t. So, yes, you can achieve different results using pass or continue.

However, both clauses work in the same way in that they stop execution of the current loop and continue with the next loop. The difference is when they stop execution of the current loop and it all hinges on the if statement block.

Enough people have written about this particular example that I want to be sure there is no confusion about the difference between pass and continue. Please let me know if you have any additional questions about these two clauses at John@JohnMuellerBooks.com.

 

 

Contemplating the Issue of Bias in Data Science

When Luca and I wrote Python for Data Science for Dummies we tried to address a range of topics that aren’t well covered in other places. Imagine my surprise when I saw a perfect article to illustrate one of these topics in ComputerWorld this week, Maybe robots, A.I. and algorithms aren’t as smart as we think. With the use of AI and data science growing exponentially, you might think that computers can think. They can’t. Computers can emulate or simulate the thinking process, but they don’t actually think. A computer is a machine designed to perform math quite quickly. If we want thinking computers, then we need a different kind of a machine. It’s the reason I wrote the Computers with Common Sense? post not too long ago. The sort of computer that could potentially think is a neural network and I discuss them in the Considering the Future of Processing Power post. (Even Intel’s latest 18 core processor, which is designed for machine learning and analytics isn’t a neural network—it simply performs the tasks that processors do now more quickly.)

However, the situation is worse than you might think, which is the reason for mentioning the ComputerWorld article. A problem occurs when the computer scientists and data scientists working together to create algorithms that make it appear that computers can think forget that they really can’t do any such thing. Luca and I discuss the effects of bias in Chapter 18 of our book. The chapter might have seemed academic at one time—something of interest, but potentially not all that useful. Today that chapter has taken on added significance. Read the ComputerWorld article and you find that Flickr recently released a new image recognition technology. The effects of not considering the role of bias in interpreting data and in the use of algorithms has has horrible results. The Guardian goes into even more details, describing how the program has tagged black people as apes and animals. Obviously, no one wanted that particular result, but forgetting that computers can’t think has caused precisely that unwanted result.

AI is an older technology that isn’t well understood because we don’t really understand our own thinking processes. It isn’t possible to create a complete and useful model of something until you understand what it is that you’re modeling and we don’t truly understand intelligence. Therefore, it’s hardly surprising that AI has taken so long to complete even the smallest baby steps. Data science is a newer technology that seeks to help people see patterns in huge data sets—to understand the data better and to create knowledge where none existed before. Neither technology is truly designed for stand-alone purposes yet. While I find Siri an interesting experiment, it’s just that, an experiment.

The Flickr application tries to take the human out of the loop and you see the result. Technology is designed to help mankind achieve more by easing the mundane tasks performed to accomplish specific goals. When you take the person out of the loop, what you have is a computer that is only playing around at thinking from a mathematical perspective—nothing more. It’s my hope that the Flickr incident will finally cause people to start thinking about computers, algorithms, and data as the tools that they truly are—tools designed to help people excel in new ways. Let me know your thoughts about AI and data science at John@JohnMuellerBooks.com.

 

Considering the Effects of Automation

After recently watching Disney’s new movie, Tomorrowland, I started thinking about the world that really could come about tomorrow. Of course, it will have many of the same problems we have today, but I’m sure it will also have a few new problems and hopefully, some of the old problems will see some sort of resolution. My recent forays into advanced math have given me a new perspective of just what it will take to create tomorrow. In writing both Python for Data Science for Dummies and MATLAB for Dummies I’ve come to a greater appreciation of the role that both math and science will play in creating this new world—not that there was any lack of appreciation before I wrote the books, but the vision now is clearer.

The fact of the matter is that people will require more education. Even plumbers and electricians will need to know more in order to deal with new technologies coming on the scene (think about performing tasks such as installing solar panels). It will come to a point where advanced schooling after high school (whether trade or technical) is going to become a necessity. Yes, people can still get jobs today without a college education, but those days are coming to an end with the advances in robotics I keep reading about. For example, a recent New York Times article, As Robots Grow Smarter, American Workers Struggle to Keep Up, says quite a lot about the future of low paying jobs—they simply won’t exist. Articles such as the one found in MIT Technology Review, Robots That Learn Through Repetition, Not Programming, tell the story of why this is the case. In the future, robots will learn to perform new tasks as needed. The tone of some of these articles is a bit negative because we’re viewing the future through today’s eyes.

What I see in the future are opportunities for people to create, but in a safer environment than in the past. Just as it’s difficult to see the past as it actually was (the way the people viewed things at that time), trying to view the future, even if you have some inkling of what that future might contain, is difficult. For example, imagine having to saddle your horse before you can go anywhere—people today are used to simply climbing into the car and turning the key. However, if you lived in the early 1900s, a car was a really loud, obnoxious device that would spell the ruination of society—horses were far more practical and comfortable (interestingly enough, about 40 percent of those cars were steam powered). There is a difference in viewpoint that is hard to overcome (or even imagine for that matter). A ComputerWorld article, How enterprises can use artificial intelligence, describes how technology in the movies doesn’t quite match reality. In fact, you might find some of the ways in which advanced technologies and automation are used somewhat boring. Fraud detection hardly ranks as a highly exciting way to use technology, but it reflects the practical nature of how technology sees use today.

When I see kids today doing absolutely everything on a smartphone, I come to realize that they already live in a world far different from the one I knew as a child. There is no going back. Children today have different problems than I had simply because the technology is different. If I encountered a problem, I first had to find a phone to call someone for help—children today carry their phone with them (almost as another body part). Then again, children when I grew up didn’t have the problems with obesity that children do today.

A lot of the readers I talk with every day express various feelings about automation and all it entails—some are scared, others elated. The fact is that the future has always been different. Change is a part of the human condition. We’ll live through the changes that automation will create too. Let me know your thoughts on the changes that automation will bring at John@JohnMuellerBooks.com.