Finding and Employing Data Science Tools

Python for Data Science for Dummies introduces you to a number of common libraries used for data science experimentation and discovery. Most of these libraries also figure prominently as part of a data scientist’s toolbox because they provide common functionality needed for every application. However, these libraries are only the tip of the data science toolbox. Because data science is such a new technology, you can find all sorts of tools to perform a wide range of tasks, but there is little standardization and some of these tools are hard to categorize so that you know where they fit within your toolbox. That’s why I was excited to see, The data science ecosystem, the first of a three part series of articles that describe some of the tools available for use in data science projects. You can find the other two parts of the article at:

The problem for people who want to explore data science and machine learning today might not be the lack of tools, but the lack of creativity in using them. In order to explore data science, it’s important to understand that the tools only work when your prepare the data properly, employ the correct algorithm, and define reasonable goals. No matter how hard you try, data science and machine learning can’t provide you with the correct numeric sequences for the next five lottery wins. However, data science can help you locate potential sources of fraud in an organization. The article, Machine learning and the strategic snake oil reserve, sums up what may be the biggest problem with data science today—people expect miracles without putting in the required work. Fortunately, there are new tools on the horizon to make languages, such as Python, and products, such as Hadoop, easier for even the less creative mind to use (see Python and Hadoop project puts data scientists first).

Even with a great imagination, the tools available today may not do the job you want as well as they should because the underlying hardware isn’t capable of performing the required tasks. The process is further hampered by a misuse of the skills that data scientists provide (see You’re hiring the wrong data scientists for details). As a result, you need a large number of specialized tools in order to perform tasks that shouldn’t require them. However, that’s the reason why you need to know about the availability of these tools so that you can produce useful results on today’s hardware with a minimum of fuss. Asking the question, “How would Alan Turing fix A.I.?” helps you understand the complexities of the data science and machine learning environments.

Data science, machine learning, data scientists with even greater skills, and better hardware will keep the momentum going well into the future. As the Internet of Things (IoT) continues to move forward and the problem of what to do with all that data becomes even larger, data science will take on a larger role in everyone’s daily life. Count on reading more articles like, Google a step closer to developing machines with human-like intelligence, that describe the proliferation of new hardware and new tools to make the full potential of data science and machine learning a reality. In the meantime, getting the tools you need and exploring the ways in which you can creatively use data science to solve problems is the best way to go for now. Let me know your thoughts on the future of data science at John@JohnMuellerBooks.com.

 

Author: John

John Mueller is a freelance author and technical editor. He has writing in his blood, having produced 99 books and over 600 articles to date. The topics range from networking to artificial intelligence and from database management to heads-down programming. Some of his current books include a Web security book, discussions of how to manage big data using data science, a Windows command -line reference, and a book that shows how to build your own custom PC. His technical editing skills have helped over more than 67 authors refine the content of their manuscripts. John has provided technical editing services to both Data Based Advisor and Coast Compute magazines. He has also contributed articles to magazines such as Software Quality Connection, DevSource, InformIT, SQL Server Professional, Visual C++ Developer, Hard Core Visual Basic, asp.netPRO, Software Test and Performance, and Visual Basic Developer. Be sure to read John’s blog at http://blog.johnmuellerbooks.com/. When John isn’t working at the computer, you can find him outside in the garden, cutting wood, or generally enjoying nature. John also likes making wine and knitting. When not occupied with anything else, he makes glycerin soap and candles, which comes in handy for gift baskets. You can reach John on the Internet at John@JohnMuellerBooks.com. John is also setting up a website at http://www.johnmuellerbooks.com/. Feel free to take a look and make suggestions on how he can improve it.