Contemplating the Issue of Bias in Data Science

When Luca and I wrote Python for Data Science for Dummies we tried to address a range of topics that aren’t well covered in other places. Imagine my surprise when I saw a perfect article to illustrate one of these topics in ComputerWorld this week, Maybe robots, A.I. and algorithms aren’t as smart as we think. With the use of AI and data science growing exponentially, you might think that computers can think. They can’t. Computers can emulate or simulate the thinking process, but they don’t actually think. A computer is a machine designed to perform math quite quickly. If we want thinking computers, then we need a different kind of a machine. It’s the reason I wrote the Computers with Common Sense? post not too long ago. The sort of computer that could potentially think is a neural network and I discuss them in the Considering the Future of Processing Power post. (Even Intel’s latest 18 core processor, which is designed for machine learning and analytics isn’t a neural network—it simply performs the tasks that processors do now more quickly.)

However, the situation is worse than you might think, which is the reason for mentioning the ComputerWorld article. A problem occurs when the computer scientists and data scientists working together to create algorithms that make it appear that computers can think forget that they really can’t do any such thing. Luca and I discuss the effects of bias in Chapter 18 of our book. The chapter might have seemed academic at one time—something of interest, but potentially not all that useful. Today that chapter has taken on added significance. Read the ComputerWorld article and you find that Flickr recently released a new image recognition technology. The effects of not considering the role of bias in interpreting data and in the use of algorithms has has horrible results. The Guardian goes into even more details, describing how the program has tagged black people as apes and animals. Obviously, no one wanted that particular result, but forgetting that computers can’t think has caused precisely that unwanted result.

AI is an older technology that isn’t well understood because we don’t really understand our own thinking processes. It isn’t possible to create a complete and useful model of something until you understand what it is that you’re modeling and we don’t truly understand intelligence. Therefore, it’s hardly surprising that AI has taken so long to complete even the smallest baby steps. Data science is a newer technology that seeks to help people see patterns in huge data sets—to understand the data better and to create knowledge where none existed before. Neither technology is truly designed for stand-alone purposes yet. While I find Siri an interesting experiment, it’s just that, an experiment.

The Flickr application tries to take the human out of the loop and you see the result. Technology is designed to help mankind achieve more by easing the mundane tasks performed to accomplish specific goals. When you take the person out of the loop, what you have is a computer that is only playing around at thinking from a mathematical perspective—nothing more. It’s my hope that the Flickr incident will finally cause people to start thinking about computers, algorithms, and data as the tools that they truly are—tools designed to help people excel in new ways. Let me know your thoughts about AI and data science at John@JohnMuellerBooks.com.

 

Author: John

John Mueller is a freelance author and technical editor. He has writing in his blood, having produced 99 books and over 600 articles to date. The topics range from networking to artificial intelligence and from database management to heads-down programming. Some of his current books include a Web security book, discussions of how to manage big data using data science, a Windows command -line reference, and a book that shows how to build your own custom PC. His technical editing skills have helped over more than 67 authors refine the content of their manuscripts. John has provided technical editing services to both Data Based Advisor and Coast Compute magazines. He has also contributed articles to magazines such as Software Quality Connection, DevSource, InformIT, SQL Server Professional, Visual C++ Developer, Hard Core Visual Basic, asp.netPRO, Software Test and Performance, and Visual Basic Developer. Be sure to read John’s blog at http://blog.johnmuellerbooks.com/.

When John isn’t working at the computer, you can find him outside in the garden, cutting wood, or generally enjoying nature. John also likes making wine and knitting. When not occupied with anything else, he makes glycerin soap and candles, which comes in handy for gift baskets. You can reach John on the Internet at John@JohnMuellerBooks.com. John is also setting up a website at http://www.johnmuellerbooks.com/. Feel free to take a look and make suggestions on how he can improve it.