Python for Data Science for Dummies introduces you to a number of common libraries used for data science experimentation and discovery. Most of these libraries also figure prominently as part of a data scientist’s toolbox because they provide common functionality needed for every application. However, these libraries are only the tip of the data science toolbox. Because data science is such a new technology, you can find all sorts of tools to perform a wide range of tasks, but there is little standardization and some of these tools are hard to categorize so that you know where they fit within your toolbox. That’s why I was excited to see, The data science ecosystem, the first of a three part series of articles that describe some of the tools available for use in data science projects. You can find the other two parts of the article at:
- The data science ecosystem part 2: Data wrangling
- The data science ecosystem, part 3: Data applications
The problem for people who want to explore data science and machine learning today might not be the lack of tools, but the lack of creativity in using them. In order to explore data science, it’s important to understand that the tools only work when your prepare the data properly, employ the correct algorithm, and define reasonable goals. No matter how hard you try, data science and machine learning can’t provide you with the correct numeric sequences for the next five lottery wins. However, data science can help you locate potential sources of fraud in an organization. The article, Machine learning and the strategic snake oil reserve, sums up what may be the biggest problem with data science today—people expect miracles without putting in the required work. Fortunately, there are new tools on the horizon to make languages, such as Python, and products, such as Hadoop, easier for even the less creative mind to use (see Python and Hadoop project puts data scientists first).
Even with a great imagination, the tools available today may not do the job you want as well as they should because the underlying hardware isn’t capable of performing the required tasks. The process is further hampered by a misuse of the skills that data scientists provide (see You’re hiring the wrong data scientists for details). As a result, you need a large number of specialized tools in order to perform tasks that shouldn’t require them. However, that’s the reason why you need to know about the availability of these tools so that you can produce useful results on today’s hardware with a minimum of fuss. Asking the question, “How would Alan Turing fix A.I.?” helps you understand the complexities of the data science and machine learning environments.
Data science, machine learning, data scientists with even greater skills, and better hardware will keep the momentum going well into the future. As the Internet of Things (IoT) continues to move forward and the problem of what to do with all that data becomes even larger, data science will take on a larger role in everyone’s daily life. Count on reading more articles like, Google a step closer to developing machines with human-like intelligence, that describe the proliferation of new hardware and new tools to make the full potential of data science and machine learning a reality. In the meantime, getting the tools you need and exploring the ways in which you can creatively use data science to solve problems is the best way to go for now. Let me know your thoughts on the future of data science at John@JohnMuellerBooks.com.