A number of people have contacted me to tell me that the downloadable source for Machine Learning for Dummies isn’t appearing on the Dummies site as described in the book. I’ve contacted the publisher about the issue and the downloadable source is now available at http://www.dummies.com/extras/machinelearning. Please look on the Downloads tab, which you can also find at http://www.dummies.com/DummiesTitle/productCd-1119245516,descCd-DOWNLOAD.html and navigate to Click to Download to receive the approximately 485 KB source code file.
When you get the file, open the archive on your hard drive and then follow the directions in the book to create the source code repository for each language. The repository instructions appear on Page 60 for the R programming language and on Page 99 for Python. I apologize for any problems that the initial lack of source code may have caused. If you experience any problems whatsoever in using the source code, please feel free to contact me at John@JohnMuellerBooks.com. Luca and I want to be certain that you have a great learning experience, which means being able to download and use the book’s source code because using hand typed code often leads to problems.
There are two interacting forces in big data today that few people are talking about. Perhaps it just hasn’t occurred to anyone that there truly is a serious threat. This particular post is going to talk about big data used for healthcare, but the same issue applies to any use of big data. Organizations, such as Penn Medicine, are using big data to perform real world tasks that really make difference. For example, it’s now possible to predict the potential for diseases well in advance of any critical fallout now—at least for some diseases such as sepsis. The ability to predict an event before it becomes critical is important for all sorts of reasons, but the most important is improving overall health. Of course, it also affects the cost of healthcare and the need to use healthcare in the first place.
However, while writing both Python for Data Science for Dummies and Machine Learning for Dummies, I’ve discovered the fallout of data errors is more critical than anyone can imagine. Ensuring correct data entry is a large part of the solution, but there are other concerns. Yes, algorithms can learn to determine which data is useful and which data isn’t, but the purer the data at the outset, the better.
While writing Security for Web Developers I reviewed many sorts of security breach, some of which involve modifying organizational data. What this means is that an outsider could potentially corrupt the big data used to make assumptions about medical conditions. Do you see where I’m going with this? Having bad data, data that is modified by an outsider and therefore not as likely to gain the attention of someone who can fix it, will cause those algorithms to make some invalid assumptions. Humans help correct the assumptions, but humans aren’t perfect and make assumptions about the behavior of the algorithm. The bottom line is that security breaches of the wrong sort could end up costing lives. It’s something to think about anyway.
The potential for error in big data analysis is just one of a whole bunch of reasons that I’m happy to read that the government is finally looking into ways to bolster the devices used to work with medical data. I’m almost positive that medical practitioners will fight tooth and nail against the new security measures, just like users of every persuasion do, but the security measures really are more important than just protecting individual patient data. As data becomes the centerpiece of all sorts of human endeavors, ensuring it remains as pristine as possible becomes ever more important. Security has to take a bigger role in data management in the future. Let me know your thoughts on securing data that could be used for medical analysis at John@JohnMuellerBooks.com.
I recently read an article on ComputerWorld, Children mine cobalt used in smartphones, other electronics, that had me thinking yet again about how people in rich countries tend to ignore the needs of those in poor countries. The picture at the beginning of the article says it all, but the details will have you wondering whether a smartphone really is worth some child’s life. That’s right, any smartphone you buy may be killing someone and in a truly horrid manner. Children as young as 7 years old are mining the cobalt needed for the batteries (and other components) in the smartphones that people seem to feel are so necessary for life (they aren’t you know).
The problem doesn’t stop when someone gets the smartphone. Other children end up dismantling the devices sent for recycling. That’s right, a rich country’s efforts to keep electronics out of their landfills is also killing children because countries like India put these children to work taking them apart in unsafe conditions. Recycled wastes go from rich countries to poor countries because the poor countries need the money for necessities, like food. Often, these children are incapable of working by the time they reach 35 or 40 due to health issues induced by their forced labor. In short, the quality of their lives is made horribly low so that it’s possible for people in rich countries to enjoy something that truly isn’t necessary for life.
I’ve written other blog posts about the issues of technology pollution. One of the most recent is More People Noticing that Green Technology Really Isn’t. However, the emphasis of these previous articles has been on the pollution itself. Taking personal responsibility for the pollution you create is important, but we really need to do more. Robotic (autonomous) mining is one way to keep children out of the mines and projects such as The Utah Robotic Mining Project show that it’s entirely possible to use robots in place of people today. The weird thing is that autonomous mining would save up to 80% of the mining costs of today, so you have to wonder why manufacturers aren’t rushing to employ this solution. In addition, off world mining would keep the pollution in space, rather than on planet earth. Of course, off world mining also requires a heavy investment in robots, but it promises to provide a huge financial payback in addition to keeping earth a bit cleaner (some companies are already investing in off world mining, but we need more). The point is that there are alternatives that we’re not using. Robotics presents an opportunity to make things right with technology and I’m excited to be part of that answer in writing books such as Python for Data Science for Dummies and Machine Learning for Dummies (see the posts for this book).
Unfortunately, companies like Apple, Samsung, and many others simply thumb their noses at laws that are in place to protect the children in these countries because they know you’ll buy their products. Yes, they make official statements, but read their statements in that first article and you’ll quickly figure out that they’re excuses and poorly made excuses at that. They don’t have to care because no one is holding them to account. People in rich countries don’t care because their own backyards aren’t sullied and their own children remain safe. So, the next time you think about buying electronics, consider the real price for that product. Let me know what you think about polluting other countries to keep your country clean at John@JohnMuellerBooks.com.
I’m currently engaged writing Machine Learning for Dummies. The book is interesting because it turns math into something more than a way to calculate. Machine learning is about having inputs and a desired result, and then asking the machine to create an algorithm that will produce the desired result from the inputs. It’s about generalization. You know the specific inputs and the specific results, but you want an algorithm that will provide similar results given similar inputs for any set of random inputs. This is more than just math. In fact, there are five schools of thought (tribes) regarding machine learning algorithms that Luca and I introduce you to in Machine Learning for Dummies:
- Symbolists: The origin of this tribe is in logic and philosophy. This group relies on inverse deduction to solve problems.
- Connectionists: The origin of this tribe is in neuroscience. This group relies on backpropagation to solve problems.
- Evolutionaries: The origin of this tribe is in evolutionary biology. This group relies on genetic programming to solve problems.
- Bayesians: This origin of this tribe is in statistics. This group relies on probabilistic inference to solve problems.
- Analogizers: The origin of this tribe is in psychology. This group relies on kernel machines to solve problems.
Of course, the problem with any technology is making it useful. I’m not talking about useful in a theoretical sense, but useful in a way that affects everyone. In other words, you must create a need for the technology so that people will continue to fund it. Machine learning is already part of many of the things you do online. For example, when you go to Amazon and buy a product, then Amazon makes suggestions on products that you might want to add to your cart, you’re seeing the result of machine learning. Part of the content for the chapters of our book is devoted to pointing out these real world uses for machine learning.
Some uses are almost, but not quite ready for prime time. One of these uses is the likes of Siri and other AIs that people talk with. The more you interact with them, the better they know you and the better they respond to your needs. The algorithms that these machine learning systems create get better and better as the database of your specific input grows. The algorithms are tuned to you specifically, so the experience one person has is different from an experience another person will have, even if the two people ask the same question. I recently read about one such system under development, Nara. What makes Nara interesting is that she seems more generalized than other forms of AI currently out there and can therefore perform more tasks. Nara is from the Connectionists and attempts to mimic the human mind. She’s all about making appropriate matches—everything from your next dinner to your next date. Reading about Nara helps you understand machine learning just a little better, at least, from the Connectionist perspective.
Machine learning is a big mystery to many people today. Given that I’m still writing this book, it would be interesting to hear your questions about machine learning. After all, I’d like to tune the content of my book to meet the most needs that I can. I’ve written a few posts about this book already and you can see them in the Machine Learning for Dummies category. After reading the posts, please let me know your thoughts on machine learning and AI. Where do you see it headed? What confuses you about it? Talk to me at John@JohnMuellerBooks.com.
Do machines really learn, or do they simply give the appearance of learning? What does it actually mean to learn and why would a machine want to do it? Some people are saying that computers will eventually learn in the same manner that children do. However, before we get to that point, it’s important to answer these basic questions and consider the implications of creating machines that can learn.
Like many seemingly new technologies, machine learning actually has its basis in existing technologies. I initially studied about artificial intelligence in 1986 and it had been around for a long time before that. Many of the statistical equations that machine learning relies upon have been around literally for centuries. It’s the application of the technology that differs. Machine learning has the potential to change the way in which the world works. A computer can experience its environment and learn how to avoid making mistakes without any human intervention. By using machine learning techniques, computers can also discover new things and even add new functionality. The computer is at the center of it all, but the computer output affects the actions of machines, such as robots. In reality, the computer learns, but the machine as a whole benefits.
Machine Learning for Dummies assumes that you have at least some math skills and a few programming skills as well. However, you do get all the basics you need to understand and use machine learning as a new way to make computers (and the machines they control) do more. While working through Machine Learning for Dummies you discover these topics:
- Part I: Introducing How Machines Learn
- Chapter 1: Getting the Real Story about AI
- Chapter 2: Learning in the Age of Big Data
- Chapter 3: Having a Glance at the Future
- Part II: Preparing Your Learning Tools
- Chapter 4: Installing a R Distribution
- Chapter 5: Coding in R Using RStudio
- Chapter 6: Installing a Python Distribution
- Chapter 7: Coding in Python Using Anaconda
- Chapter 8: Exploring Other Machine Learning Tools
- Part III: Getting Started with the Math Basics
- Chapter 9: Demystifying the Math behind Machine Learning
- Chapter 10: Descending the Right Curve
- Chapter 11: Validating Machine Learning
- Chapter 12: Starting with Simple Learners
- Part IV: Learning from Smart and Big Data
- Chapter 13: Preprocessing Data
- Chapter 14: Leveraging Similarity
- Chapter 15: Starting Easy with Linear Models
- Chapter 16: Hitting Complexity with Neural Networks
- Chapter 17: Going a Step Beyond using Support Vector Machines
- Chapter 18: Resorting to Ensembles of Learners
- Part V: Applying Learning to Real Problems
- Chapter 19: Classifying Images
- Chapter 20: Scoring Opinions and Sentiments
- Chapter 21: Recommending Products and Movies
- Part VI: The Part of Tens
- Chapter 22: Ten Machine Learning Packages to Master
- Chapter 23: Ten Ways to Improve Your Machine Learning Models
- Online: Ten Ways to Use Machine Learning in Your Organization
As you can see, this book is going to give you a good start in working with machine learning. Because of the subject matter, I really want to avoid making any errors in book, which is where you come into play. I’m looking for beta readers who use math, statistics, or computer science as part of their profession and think they might be able to benefit from the techniques that data science and/or machine learning provide. As a beta reader, you get to see the material as Luca and I write it. Your comments will help us improve the text and make it easier to use.
In consideration of your time and effort, your name will appear in the Acknowledgements (unless you specifically request that we not provide it). You also get to read the book free of charge. Being a beta reader is both fun and educational. If you have any interest in reviewing this book, please contact me at John@JohnMuellerBooks.com and will fill in all the details for you.
Python for Data Science for Dummies introduces you to a number of common libraries used for data science experimentation and discovery. Most of these libraries also figure prominently as part of a data scientist’s toolbox because they provide common functionality needed for every application. However, these libraries are only the tip of the data science toolbox. Because data science is such a new technology, you can find all sorts of tools to perform a wide range of tasks, but there is little standardization and some of these tools are hard to categorize so that you know where they fit within your toolbox. That’s why I was excited to see, The data science ecosystem, the first of a three part series of articles that describe some of the tools available for use in data science projects. You can find the other two parts of the article at:
The problem for people who want to explore data science and machine learning today might not be the lack of tools, but the lack of creativity in using them. In order to explore data science, it’s important to understand that the tools only work when your prepare the data properly, employ the correct algorithm, and define reasonable goals. No matter how hard you try, data science and machine learning can’t provide you with the correct numeric sequences for the next five lottery wins. However, data science can help you locate potential sources of fraud in an organization. The article, Machine learning and the strategic snake oil reserve, sums up what may be the biggest problem with data science today—people expect miracles without putting in the required work. Fortunately, there are new tools on the horizon to make languages, such as Python, and products, such as Hadoop, easier for even the less creative mind to use (see Python and Hadoop project puts data scientists first).
Even with a great imagination, the tools available today may not do the job you want as well as they should because the underlying hardware isn’t capable of performing the required tasks. The process is further hampered by a misuse of the skills that data scientists provide (see You’re hiring the wrong data scientists for details). As a result, you need a large number of specialized tools in order to perform tasks that shouldn’t require them. However, that’s the reason why you need to know about the availability of these tools so that you can produce useful results on today’s hardware with a minimum of fuss. Asking the question, “How would Alan Turing fix A.I.?” helps you understand the complexities of the data science and machine learning environments.
Data science, machine learning, data scientists with even greater skills, and better hardware will keep the momentum going well into the future. As the Internet of Things (IoT) continues to move forward and the problem of what to do with all that data becomes even larger, data science will take on a larger role in everyone’s daily life. Count on reading more articles like, Google a step closer to developing machines with human-like intelligence, that describe the proliferation of new hardware and new tools to make the full potential of data science and machine learning a reality. In the meantime, getting the tools you need and exploring the ways in which you can creatively use data science to solve problems is the best way to go for now. Let me know your thoughts on the future of data science at John@JohnMuellerBooks.com.