Missing Machine Learning for Dummies Downloadable Source Files

A number of people have contacted me to tell me that the downloadable source for Machine Learning for Dummies isn’t appearing on the Dummies site as described in the book. I’ve contacted the publisher about the issue and the downloadable source is now available at http://www.dummies.com/extras/machinelearning. Please look on the Downloads tab, which you can also find at http://www.dummies.com/DummiesTitle/productCd-1119245516,descCd-DOWNLOAD.html and navigate to Click to Download to receive the approximately 485 KB source code file.

When you get the file, open the archive on your hard drive and then follow the directions in the book to create the source code repository for each language. The repository instructions appear on Page 60 for the R programming language and on Page 99 for Python. I apologize for any problems that the initial lack of source code may have caused. If you experience any problems whatsoever in using the source code, please feel free to contact me at John@JohnMuellerBooks.com. Luca and I want to be certain that you have a great learning experience, which means being able to download and use the book’s source code because using hand typed code often leads to problems.


Security Breaches and the Potential Effect on Big Data

There are two interacting forces in big data today that few people are talking about. Perhaps it just hasn’t occurred to anyone that there truly is a serious threat. This particular post is going to talk about big data used for healthcare, but the same issue applies to any use of big data. Organizations, such as Penn Medicine, are using big data to perform real world tasks that really make difference. For example, it’s now possible to predict the potential for diseases well in advance of any critical fallout now—at least for some diseases such as sepsis. The ability to predict an event before it becomes critical is important for all sorts of reasons, but the most important is improving overall health. Of course, it also affects the cost of healthcare and the need to use healthcare in the first place.

However, while writing both Python for Data Science for Dummies and Machine Learning for Dummies, I’ve discovered the fallout of data errors is more critical than anyone can imagine. Ensuring correct data entry is a large part of the solution, but there are other concerns. Yes, algorithms can learn to determine which data is useful and which data isn’t, but the purer the data at the outset, the better.

While writing Security for Web Developers I reviewed many sorts of security breach, some of which involve modifying organizational data. What this means is that an outsider could potentially corrupt the big data used to make assumptions about medical conditions. Do you see where I’m going with this? Having bad data, data that is modified by an outsider and therefore not as likely to gain the attention of someone who can fix it, will cause those algorithms to make some invalid assumptions. Humans help correct the assumptions, but humans aren’t perfect and make assumptions about the behavior of the algorithm. The bottom line is that security breaches of the wrong sort could end up costing lives. It’s something to think about anyway.

The potential for error in big data analysis is just one of a whole bunch of reasons that I’m happy to read that the government is finally looking into ways to bolster the devices used to work with medical data. I’m almost positive that medical practitioners will fight tooth and nail against the new security measures, just like users of every persuasion do, but the security measures really are more important than just protecting individual patient data. As data becomes the centerpiece of all sorts of human endeavors, ensuring it remains as pristine as possible becomes ever more important. Security has to take a bigger role in data management in the future. Let me know your thoughts on securing data that could be used for medical analysis at John@JohnMuellerBooks.com.


Python for Data Science for Dummies Errata on Page 221

The downloadable source for Python for Data Science for Dummies contains a problem that doesn’t actually appear in the book. If you look at page 221, the code block in the middle of the page contains a line saying import numpy as np. This line is essential because the code won’t run without it. The downloadable source for Chapter 12 is missing this line so the example doesn’t run. This P4DS4D; 12; Stretching Pythons Capabilities link provides you with a .ZIP file that contains the replacement source code. Simple remove the P4DS4D; 12; Stretching Pythons Capabilities.ipynb file from the archive and use it in place of your existing file.

Luca and I always want you to have a great experience with our book, so keep those emails coming. Please let me know if you have any questions about source code file update at John@JohnMuellerBooks.com. I’m sorry about any errors that appear in the downloadable source and appreciate the readers who have pointed them out.


Python for Data Science for Dummies Errata on Page 145

Python for Data Science for Dummies contains two errors on page 145. The first error appears in the second paragraph on that page. You can safely disregard the sentence that reads, “The use_idf controls the use of inverse-document-frequency reweighting, which is turned off in this case.” The code doesn’t contain a reference to the use_idf parameter. However, you can read about it on the Scikit-Learn site. This parameter defaults to being turned on, which is how it’s used for the example.

The second error is also in the second paragraph. The discussion references the tf_transformer.transform() method call. The actual method call is tfidf.transform(), which does appear in the sample code. The discussion about how the method works is correct, just the name of the object is wrong.

Please let me know if you have any questions about either of these changes at John@JohnMuellerBooks.com. I’m sorry about any errors that appear in the book and appreciate the readers who have pointed them out.


Technology and Child Safety

I recently read an article on ComputerWorld, Children mine cobalt used in smartphones, other electronics, that had me thinking yet again about how people in rich countries tend to ignore the needs of those in poor countries. The picture at the beginning of the article says it all, but the details will have you wondering whether a smartphone really is worth some child’s life. That’s right, any smartphone you buy may be killing someone and in a truly horrid manner. Children as young as 7 years old are mining the cobalt needed for the batteries (and other components) in the smartphones that people seem to feel are so necessary for life (they aren’t you know).

The problem doesn’t stop when someone gets the smartphone. Other children end up dismantling the devices sent for recycling. That’s right, a rich country’s efforts to keep electronics out of their landfills is also killing children because countries like India put these children to work taking them apart in unsafe conditions. Recycled wastes go from rich countries to poor countries because the poor countries need the money for necessities, like food. Often, these children are incapable of working by the time they reach 35 or 40 due to health issues induced by their forced labor. In short, the quality of their lives is made horribly low so that it’s possible for people in rich countries to enjoy something that truly isn’t necessary for life.

I’ve written other blog posts about the issues of technology pollution. One of the most recent is More People Noticing that Green Technology Really Isn’t. However, the emphasis of these previous articles has been on the pollution itself. Taking personal responsibility for the pollution you create is important, but we really need to do more. Robotic (autonomous) mining is one way to keep children out of the mines and projects such as The Utah Robotic Mining Project show that it’s entirely possible to use robots in place of people today. The weird thing is that autonomous mining would save up to 80% of the mining costs of today, so you have to wonder why manufacturers aren’t rushing to employ this solution. In addition, off world mining would keep the pollution in space, rather than on planet earth. Of course, off world mining also requires a heavy investment in robots, but it promises to provide a huge financial payback in addition to keeping earth a bit cleaner (some companies are already investing in off world mining, but we need more). The point is that there are alternatives that we’re not using. Robotics presents an opportunity to make things right with technology and I’m excited to be part of that answer in writing books such as Python for Data Science for Dummies and Machine Learning for Dummies (see the posts for this book).

Unfortunately, companies like Apple, Samsung, and many others simply thumb their noses at laws that are in place to protect the children in these countries because they know you’ll buy their products. Yes, they make official statements, but read their statements in that first article and you’ll quickly figure out that they’re excuses and poorly made excuses at that. They don’t have to care because no one is holding them to account. People in rich countries don’t care because their own backyards aren’t sullied and their own children remain safe. So, the next time you think about buying electronics, consider the real price for that product. Let me know what you think about polluting other countries to keep your country clean at John@JohnMuellerBooks.com.


Python for Data Science for Dummies Errata on Page 124

Python for Data Science for Dummies contains an error in the example that appears on the top half of page 124. In the first of the two grey boxes, the code computes the results of four print statements. The bottom-most print statement, print x[1:2, 1:2], is supposed to compute a result based on rows 1 and 2 of columns 1 and 2, and the bottom grey box seems to confirm that interpretation by the showing the result as [[[14 15 16] [17 18 19]] [[24 25 26] [27 28 29]]]. However, the answer provided for this example in the downloadable source code is [[[14 15 16]]], which doesn’t agree with that in the text.

The good news is that the downloadable source contains the correct code. The error appears only in the book. The last print statement in the book is wrong. Here is the correct code (with output) for this example:

x = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9],],
 [[11,12,13], [14,15,16], [17,18,19],],
 [[21,22,23], [24,25,26], [27,28,29]]])

print x[1,1]
print x[:,1,1]
print x[1,:,1]
print x[1:3, 1:3]
[14 15 16]
[ 5 15 25]
[12 15 18]

[[[14 15 16]
 [17 18 19]]

[[24 25 26]
 [27 28 29]]]

Please let me know if you have any questions about this example at John@JohnMuellerBooks.com. I’m sorry about the error that appears in the book and appreciate the readers who have pointed it out.


Missing XMLData2.xml File

A number of readers have written to report that XMLData2.xml is missing from the downloadable source for Python for Data Science for Dummies. You encounter this file in Chapter 6, on page 108. The publisher has already added the file to the downloadable source, but you might be missing the file from your copy. If so, you can download it by clicking XMLData2.zip. I’m truly sorry about any problems that the missing file might have caused. Please be sure to let me know about your book specific question at John@JohnMuellerBooks.com.


Tip Error in Python for Data Science for Dummies

There is a small error on page 318 of Python for Data Science for Dummies. You can find it near the middle of the page in the Tip text. The current text on the second line of that paragraph says, “k as a number near the squared number of available observations.” However, the text should really read, “k as a number near the squared root number of available observations.” The word root is missing, which obviously changes the mathematical meaning of the text. Please accept our apologies for the typo. Let me know if you find any other errors of a technical nature in the book at John@JohnMuellerBooks.com and I’ll be sure to provide a blog post about it here. Thank you for your support!


Beta Readers Needed for Machine Learning for Dummies

Do machines really learn, or do they simply give the appearance of learning? What does it actually mean to learn and why would a machine want to do it? Some people are saying that computers will eventually learn in the same manner that children do. However, before we get to that point, it’s important to answer these basic questions and consider the implications of creating machines that can learn.

Like many seemingly new technologies, machine learning actually has its basis in existing technologies. I initially studied about artificial intelligence in 1986 and it had been around for a long time before that. Many of the statistical equations that machine learning relies upon have been around literally for centuries. It’s the application of the technology that differs. Machine learning has the potential to change the way in which the world works. A computer can experience its environment and learn how to avoid making mistakes without any human intervention. By using machine learning techniques, computers can also discover new things and even add new functionality. The computer is at the center of it all, but the computer output affects the actions of machines, such as robots. In reality, the computer learns, but the machine as a whole benefits.

Machine Learning for Dummies assumes that you have at least some math skills and a few programming skills as well. However, you do get all the basics you need to understand and use machine learning as a new way to make computers (and the machines they control) do more. While working through Machine Learning for Dummies you discover these topics:

  • Part I: Introducing How Machines Learn
    • Chapter 1: Getting the Real Story about AI
    • Chapter 2: Learning in the Age of Big Data
    • Chapter 3: Having a Glance at the Future
  • Part II: Preparing Your Learning Tools

    • Chapter 4: Installing a R Distribution
    • Chapter 5: Coding in R Using RStudio
    • Chapter 6: Installing a Python Distribution
    • Chapter 7: Coding in Python Using Anaconda
    • Chapter 8: Exploring Other Machine Learning Tools
  • Part III: Getting Started with the Math Basics

    • Chapter 9: Demystifying the Math behind Machine Learning
    • Chapter 10: Descending the Right Curve
    • Chapter 11: Validating Machine Learning
    • Chapter 12: Starting with Simple Learners
  • Part IV: Learning from Smart and Big Data
    • Chapter 13: Preprocessing Data
    • Chapter 14: Leveraging Similarity
    • Chapter 15: Starting Easy with Linear Models
    • Chapter 16: Hitting Complexity with Neural Networks
    • Chapter 17: Going a Step Beyond using Support Vector Machines
    • Chapter 18: Resorting to Ensembles of Learners
  • Part V: Applying Learning to Real Problems
    • Chapter 19: Classifying Images
    • Chapter 20: Scoring Opinions and Sentiments
    • Chapter 21: Recommending Products and Movies
  • Part VI: The Part of Tens
    • Chapter 22: Ten Machine Learning Packages to Master
    • Chapter 23: Ten Ways to Improve Your Machine Learning Models
    • Online: Ten Ways to Use Machine Learning in Your Organization

As you can see, this book is going to give you a good start in working with machine learning. Because of the subject matter, I really want to avoid making any errors in book, which is where you come into play. I’m looking for beta readers who use math, statistics, or computer science as part of their profession and think they might be able to benefit from the techniques that data science and/or machine learning provide. As a beta reader, you get to see the material as Luca and I write it. Your comments will help us improve the text and make it easier to use.

In consideration of your time and effort, your name will appear in the Acknowledgements (unless you specifically request that we not provide it). You also get to read the book free of charge. Being a beta reader is both fun and educational. If you have any interest in reviewing this book, please contact me at John@JohnMuellerBooks.com and will fill in all the details for you.


Missing File from Python for Data Science for Dummies Downloadable Source

A reader recently contacted me regarding a missing file from the downloadable source for Python for Data Science for Dummies. This is the P4DS4D; 01; Quick Overview.ipynb you need for the first chapter. Simply click here to download P4DS4D; 01; Quick Overview.ipynb. I’m also asking the publisher to add the missing file to the downloadable source found on the Dummies site at http://www.dummies.com/store/product/Python-for-Data-Science-For-Dummies.productCd-1118844181,descCd-DOWNLOAD.html. If you encounter any other problems with the book, please be sure to let me know at John@JohnMuellerBooks.com. Thank you for your patience!