Finding and Employing Data Science Tools

Python for Data Science for Dummies introduces you to a number of common libraries used for data science experimentation and discovery. Most of these libraries also figure prominently as part of a data scientist’s toolbox because they provide common functionality needed for every application. It is a great idea for those who are interested in expanding their knowledge in data science and how it can be applied to the field of Artificial Intelligence (AI). You can learn more about some of the basic principles such as applying, developing, leveraging and creating data science projects. However, these libraries are only the tip of the data science toolbox. Because data science is such a new technology, you can find all sorts of tools to perform a wide range of tasks, but there is little standardization and some of these tools are hard to categorize so that you know where they fit within your toolbox. That’s why I was excited to see, The data science ecosystem, the first of a three part series of articles that describe some of the tools available for use in data science projects. If you are interested in finding out more about data science, you might want to check out this data science bootcamp for more information. You can also find the other two parts of the article at:

The problem for people who want to explore data science and machine learning today might not be the lack of tools, but the lack of creativity in using them. In order to explore data science, it’s important to understand that the tools only work when your prepare the data properly, employ the correct algorithm, and define reasonable goals. So for those that are looking for suitable tools and aid when looking to start experimenting with data science or machine learning processes they might look to collaborate with other data scientists using this open-source dvc data science platform or one similar that can integrate many other data science tools. No matter how hard you try, data science and machine learning can’t provide you with the correct numeric sequences for the next five lottery wins. However, data science can help you locate potential sources of fraud in an organization. The article, Machine learning and the strategic snake oil reserve, sums up what may be the biggest problem with data science today-people expect miracles without putting in the required work. Fortunately, there are new tools on the horizon to make languages, such as Python, and products, such as Hadoop, easier for even the less creative mind to use (see Python and Hadoop project puts data scientists first).

Even with a great imagination, the tools available today may not do the job you want as well as they should because the underlying hardware isn’t capable of performing the required tasks. The process is further hampered by a misuse of the skills that data scientists provide (see You’re hiring the wrong data scientists for details). As a result, you need a large number of specialized tools in order to perform tasks that shouldn’t require them. However, that’s the reason why you need to know about the availability of these tools so that you can produce useful results on today’s hardware with a minimum of fuss. Asking the question, “How would Alan Turing fix A.I.?” helps you understand the complexities of the data science and machine learning environments.

Data science, machine learning, data scientists with even greater skills, and better hardware will keep the momentum going well into the future. As the Internet of Things (IoT) continues to move forward and the problem of what to do with all that data becomes even larger, data science will take on a larger role in everyone’s daily life. Count on reading more articles like, Google a step closer to developing machines with human-like intelligence, that describe the proliferation of new hardware and new tools to make the full potential of data science and machine learning a reality. In the meantime, getting the tools you need and exploring the ways in which you can creatively use data science to solve problems is the best way to go for now. Let me know your thoughts on the future of data science at [email protected].

Missing Python for Data Science for Dummies Companion Files

For all those long suffering readers who have been missing the companion files for Python for Data Science for Dummies, they’re finally available at http://www.dummies.com/store/product/Python-for-Data-Science-For-Dummies.productCd-1118844181,descCd-DOWNLOAD.html. All you need to do is click the Click to Download link on the page. I’m truly sorry you needed to wait so long. Thank you to everyone who noticed the missing files and also the incorrect link in the book, which now appears in the book errata. Please let me know if you have any problems locating the files or downloading them at [email protected].

 

Getting Your Python for Data Science for Dummies Extras

The process of discovering how to use Python to perform data science tasks begins when you get your copy of Python for Data Science for Dummies. Luca and I spent a good deal of time making your data science learning experience easier and even fun. However, it only starts there. Like many of my other books, you can also find online content for Python for Data Science for Dummies in these forms:

I always want to hear your questions about my books. Be sure to write me about them at [email protected]. In the meantime, I hope you enjoy your Python for Data Science for Dummies reading experience. Thank you for your continued support.


20 July 2015: Updated to show correct link for the companion files.

 

Contemplating the Issue of Bias in Data Science

When Luca and I wrote Python for Data Science for Dummies we tried to address a range of topics that aren’t well covered in other places. Imagine my surprise when I saw a perfect article to illustrate one of these topics in ComputerWorld this week, Maybe robots, A.I. and algorithms aren’t as smart as we think. With the use of AI and data science growing exponentially due to the fact it can help improve a company’s marketing tenfold, you might think that computers can think. They can’t. You can learn about the role data science has in marketing here but for now, I think it’s important to reiterate the fact that computers can emulate or simulate the thinking process, but they don’t actually think. A computer is a machine designed to perform math quite quickly. If we want thinking computers, then we need a different kind of a machine. It’s the reason I wrote the Computers with Common Sense? post not too long ago. The sort of computer that could potentially think is a neural network and I discuss them in the Considering the Future of Processing Power post. (Even Intel’s latest 18 core processor, which is designed for machine learning and analytics isn’t a neural network-it simply performs the tasks that processors do now more quickly.)

However, the situation is worse than you might think, which is the reason for mentioning the ComputerWorld article. A problem occurs when the computer scientists and data scientists working together to create algorithms that make it appear that computers can think forget that they really can’t do any such thing. Luca and I discuss the effects of bias in Chapter 18 of our book. The chapter might have seemed academic at one time-something of interest, but potentially not all that useful. Today that chapter has taken on added significance. Read the ComputerWorld article and you find that Flickr recently released a new image recognition technology. The effects of not considering the role of bias in interpreting data and in the use of algorithms has has horrible results. The Guardian goes into even more details, describing how the program has tagged black people as apes and animals. Obviously, no one wanted that particular result, but forgetting that computers can’t think has caused precisely that unwanted result.

AI is an older technology that isn’t well understood because we don’t really understand our own thinking processes. It isn’t possible to create a complete and useful model of something until you understand what it is that you’re modeling and we don’t truly understand intelligence. Therefore, it’s hardly surprising that AI has taken so long to complete even the smallest baby steps. Data science is a newer technology that seeks to help people see patterns in huge data sets-to understand the data better and to create knowledge where none existed before. Neither technology is truly designed for stand-alone purposes yet. While I find Siri an interesting experiment, it’s just that, an experiment.

The Flickr application tries to take the human out of the loop and you see the result. Technology is designed to help mankind achieve more by easing the mundane tasks performed to accomplish specific goals. When you take the person out of the loop, what you have is a computer that is only playing around at thinking from a mathematical perspective-nothing more. It’s my hope that the Flickr incident will finally cause people to start thinking about computers, algorithms, and data as the tools that they truly are-tools designed to help people excel in new ways. Let me know your thoughts about AI and data science at [email protected].

 

Computers with Common Sense?

The whole idea behind products, such as Siri, is to give computers a friendlier face. Much like the computer on the Enterprise in Star Trek, you converse with the machine and get intelligent answers back much of the time. The problem is that computers don’t currently have common sense. A computer really doesn’t understand anything anyone says to it. What you’re seeing is incredibly complex and clever programming. The understanding is in the math behind the programming. Computers truly are machines that perform math-related tasks with extreme speed and perfection.

It was with great interest that I recently read an article on the Guardian, Google a step closer to developing machines with human-like intelligence. The opening statement is misleading and meant to bedazzle the audience, but then the article gets into the actual process behind computers that could emulate common sense well enough that we’d anthropomorphize them even more than we do now. If the efforts of Professor Geoff Hinton and others are successful, computers could potentially pass the Turing Test in a big way. In short, it would become hard to tell a computer apart from a human. We very well could treat them as friends sometime in the future (some people are almost there now).

Articles often allude to scientific principles, but don’t really explain them. The principle at play in this case is the use of sentiment analysis based on words and word n-grams. You can build a sentiment analysis by using machine learning and multiclass predictors. Fortunately, you don’t have to drive yourself nuts trying to understand the basis for the code you find online. Luca and I wrote Python for Data Science for Dummies to make it easier to understand the science behind the magic that modern applications seemingly ply. Let me know your thoughts about the future of computers with common sense at [email protected].

 

No Assembly Required

A problem with many robots today is that they’re bulky. Transporting the robot can be a problem because it takes up a lot of space. Unfortunately, some scenarios require that the robot arrive at its destination fully assembled. For example, there isn’t anyone on Mars to put a robot that lands there together. I’ve been following a number of stories about robots that self-assemble or transform in some way, but the story Engineers Built an Origami Robot That Can Fold and Crawl Without Human Intervention provides a great overview of what’s happening with robotic science today.

The idea that a robot can fold itself up into a form that’s akin to a sheet of paper and then unfold itself into a useful shape is phenomenal. According to The Guardian, the robot could see use on the battlefield or in space. The accompanying video is pretty impressive. The feeling is one of an autonomous machine that can almost think its way through some basic problems. The robot need not actually start out flat though. A recent InfoWorld story tells of a robot that can transform between an I shape and a 3 shape. This robot is being used to explore the crippled Fukushima Dai-Ichi nuclear power plant and the shape changes are necessary for the robot to move freely. An update to the story on ComputerWorld, tells that the robot still has a ways to go before the shape shifting works without problem.

Of course, these machines are thinking in a way. A Wired article helps you understand the thinking that goes into the design of the origami robot. (The details of the transforming robot aren’t available at this time, but it does have a tether to allow outside interaction—something the origami robot doesn’t need.) Luca’s and my upcoming book, Python for Data Science for Dummies, can help you understand the science and programming behind the artificial intelligence in these robots to an even greater degree. The point is that the origami robot demonstrates that software and good engineering are working together to turn an inexpensive 2D technology into a viable robot that could perform a wide variety of tasks. The point of the Wired article is that the technology is both cheap and easy—it doesn’t rely on anything exotic to make it work. Meanwhile, the transforming robot shows that these devices can work in extremely hazardous conditions that humans could never tolerate.

The sexy view of robots in the movies is full fledged human looking devices or monster construction machines of the sort found in I, Robot. The fact of the matter is that we may very well produce robots of that sort (we’re building them at this moment to act as caregivers), but we’ll also produce a great many robots of other types, such as these origami and transforming robots. Think more along the lines of Blade Runner, which contains a wide variety of robot types. Consider how robots might be used in the real world to perform mundane tasks. For example, the Roomba looks nothing like a robot. It sort of looks like a really big hockey puck.

How do you think the introduction of robots into society will go? Will we continue to see a vast assortment of odd looking robots or will they begin to take on more human characteristics? The future looks truly amazing, but I’d like to hear your point of view today. Talk to me about robotics at [email protected].

 

Python 2.7.9 Update

Beginning Programming with Python For Dummies is based on Python 3.3. However, I know that some of you are using Python 2.x installations instead. My book does discuss some of the differences between the two releases and makes you aware of examples that won’t work. However, if you do decide to use Python 2.x despite the limitations when it comes to the book, I highly recommend you get the Python 2.7.9 update. The update contains a slew of important bug fixes, many of which affect security, which is always an important issue when it comes to applications.

A reader recently sent me an InfoWorld Tech Watch article that highlights the updates in the 2.7.9 release for you. The most important thing to know from a book perspective is that the update doesn’t offer any new features. This means that if an example didn’t work with 2.x in the past, it won’t work with 2.7.9 either.

A number of readers feel that the Python 2.x releases are better and the bug updates simply mean that it remains popular. Because the 3.x release is the preferred release, I chose to focus on it when I wrote the book. Yes, you can use my book with the 2.x release, but I guarantee some examples simply won’t work with it.

Please let me know if you have any other questions about my book, the level of Python support it provides, or whether the Python 2.7.9 release will provide any book-related advantage other than ensuring your system will remain safe at [email protected]. I want to ensure you have the best reading experience possible. However, there isn’t any chance at all that I’ll rewrite book examples to work with 2.x unless there is a significant number of readers who want this feature. Even then, some examples simply won’t work because there is no workaround to make them work (essentially the reason we needed the 3.x update).