Contemplating the Issue of Bias in Data Science

When Luca and I wrote Python for Data Science for Dummies we tried to address a range of topics that aren’t well covered in other places. Imagine my surprise when I saw a perfect article to illustrate one of these topics in ComputerWorld this week, Maybe robots, A.I. and algorithms aren’t as smart as we think. With the use of AI and data science growing exponentially, you might think that computers can think. They can’t. Computers can emulate or simulate the thinking process, but they don’t actually think. A computer is a machine designed to perform math quite quickly. If we want thinking computers, then we need a different kind of a machine. It’s the reason I wrote the Computers with Common Sense? post not too long ago. The sort of computer that could potentially think is a neural network and I discuss them in the Considering the Future of Processing Power post. (Even Intel’s latest 18 core processor, which is designed for machine learning and analytics isn’t a neural network—it simply performs the tasks that processors do now more quickly.)

However, the situation is worse than you might think, which is the reason for mentioning the ComputerWorld article. A problem occurs when the computer scientists and data scientists working together to create algorithms that make it appear that computers can think forget that they really can’t do any such thing. Luca and I discuss the effects of bias in Chapter 18 of our book. The chapter might have seemed academic at one time—something of interest, but potentially not all that useful. Today that chapter has taken on added significance. Read the ComputerWorld article and you find that Flickr recently released a new image recognition technology. The effects of not considering the role of bias in interpreting data and in the use of algorithms has has horrible results. The Guardian goes into even more details, describing how the program has tagged black people as apes and animals. Obviously, no one wanted that particular result, but forgetting that computers can’t think has caused precisely that unwanted result.

AI is an older technology that isn’t well understood because we don’t really understand our own thinking processes. It isn’t possible to create a complete and useful model of something until you understand what it is that you’re modeling and we don’t truly understand intelligence. Therefore, it’s hardly surprising that AI has taken so long to complete even the smallest baby steps. Data science is a newer technology that seeks to help people see patterns in huge data sets—to understand the data better and to create knowledge where none existed before. Neither technology is truly designed for stand-alone purposes yet. While I find Siri an interesting experiment, it’s just that, an experiment.

The Flickr application tries to take the human out of the loop and you see the result. Technology is designed to help mankind achieve more by easing the mundane tasks performed to accomplish specific goals. When you take the person out of the loop, what you have is a computer that is only playing around at thinking from a mathematical perspective—nothing more. It’s my hope that the Flickr incident will finally cause people to start thinking about computers, algorithms, and data as the tools that they truly are—tools designed to help people excel in new ways. Let me know your thoughts about AI and data science at John@JohnMuellerBooks.com.