Contemplating the Issue of Bias in Data Science

When Luca and I wrote Python for Data Science for Dummies we tried to address a range of topics that aren’t well covered in other places. Imagine my surprise when I saw a perfect article to illustrate one of these topics in ComputerWorld this week, Maybe robots, A.I. and algorithms aren’t as smart as we think. With the use of AI and data science growing exponentially, you might think that computers can think. They can’t. Computers can emulate or simulate the thinking process, but they don’t actually think. A computer is a machine designed to perform math quite quickly. If we want thinking computers, then we need a different kind of a machine. It’s the reason I wrote the Computers with Common Sense? post not too long ago. The sort of computer that could potentially think is a neural network and I discuss them in the Considering the Future of Processing Power post. (Even Intel’s latest 18 core processor, which is designed for machine learning and analytics isn’t a neural network—it simply performs the tasks that processors do now more quickly.)

However, the situation is worse than you might think, which is the reason for mentioning the ComputerWorld article. A problem occurs when the computer scientists and data scientists working together to create algorithms that make it appear that computers can think forget that they really can’t do any such thing. Luca and I discuss the effects of bias in Chapter 18 of our book. The chapter might have seemed academic at one time—something of interest, but potentially not all that useful. Today that chapter has taken on added significance. Read the ComputerWorld article and you find that Flickr recently released a new image recognition technology. The effects of not considering the role of bias in interpreting data and in the use of algorithms has has horrible results. The Guardian goes into even more details, describing how the program has tagged black people as apes and animals. Obviously, no one wanted that particular result, but forgetting that computers can’t think has caused precisely that unwanted result.

AI is an older technology that isn’t well understood because we don’t really understand our own thinking processes. It isn’t possible to create a complete and useful model of something until you understand what it is that you’re modeling and we don’t truly understand intelligence. Therefore, it’s hardly surprising that AI has taken so long to complete even the smallest baby steps. Data science is a newer technology that seeks to help people see patterns in huge data sets—to understand the data better and to create knowledge where none existed before. Neither technology is truly designed for stand-alone purposes yet. While I find Siri an interesting experiment, it’s just that, an experiment.

The Flickr application tries to take the human out of the loop and you see the result. Technology is designed to help mankind achieve more by easing the mundane tasks performed to accomplish specific goals. When you take the person out of the loop, what you have is a computer that is only playing around at thinking from a mathematical perspective—nothing more. It’s my hope that the Flickr incident will finally cause people to start thinking about computers, algorithms, and data as the tools that they truly are—tools designed to help people excel in new ways. Let me know your thoughts about AI and data science at John@JohnMuellerBooks.com.

 

Considering the Future of Processing Power

The vast majority of processors made today perform tasks as procedures. The processor looks at an instruction, performs the task specified by that instruction, and then moves onto the next instruction. It sounds like a simple way of doing things, and it is. Because a processor can perform the instructions incredibly fast—far faster than any human can even imagine—it could appear that the computer is thinking. What you’re seeing is a processor performing one instruction at a time, incredibly fast, and really clever programming. You truly aren’t seeing any sort of thought in the conventional (human) sense of the term. Even when using Artificial Intelligence (AI), the process is still a procedure that only simulates thought.

Most chips today have multiple cores. Some systems have multiple processors. The addition of cores and processors means that the system as a whole can perform more than one task at once—one task for each core or processor. However, the effect is still procedural in nature. An application can divide itself into parts and assign each core or processor a task, which allows the application to reach specific objectives faster, but the result is still a procedure.

The reason the previous two paragraphs are necessary is that even developers have started buying into their own clever programming and feel that application programming environments somehow work like magic. There is no magic involved, just incredibly fast processors guided by even more amazing programming. In order to gain a leap in the ability of processors to perform tasks, the world needs a new kind of processor, which is the topic of this post (finally). The kind of processor that holds the most promise right now is the neural processor. Interestingly enough, science fiction has already beat science fact to the punch by featuring neural processing in shows such as Star Trek and movies such as the Terminator.

Companies such as IBM are working to turn science fiction in to science fact. The first story I read on this topic was several years ago (see IBM creates learning, brain-like, synaptic CPU). This particular story points out three special features of neural processors. The first is that a neural processor relies on massive parallelism. Instead of having just four or eight or even sixteen tasks being performed at once, even a really simple neural processor has in excess of 256 tasks being done. The second is that the electronic equivalent of neurons in such a processor work cooperatively to perform tasks, so that the processing power of the chips is magnified. The third is that the chip actually remembers what it did last and forms patterns based on that memory. This third element is what really sets neural processing apart and makes it the kind of technology that is needed to advance to the next stage of computer technology.

In the three years since the original story was written, IBM (and other companies, such as Intel) have made some great forward progress. When you read IBM Develops a New Chip That Functions Like a Brain, you see that that the technology has indeed moved forward. The latest chip is actually able to react to external stimuli. It can understand, to an extremely limited extent, the changing patterns of light (for example) it receives. An action is no longer just a jumbo of pixels, but is recognized as being initiated by someone or something. The thing that amazes me about this chip is that the power consumption is so low. Most of the efforts so far seem to focus on mobile devices, which makes sense because these processors will eventually end up in devices such as robots.

The eventual goal of all this effort is a learning computer—one that can increase its knowledge based on the inputs it receives. This technology would change the role of a programmer from creating specific instructions to one of providing basic instructions and then providing the input needed for the computer to learn what it needs to know to perform specific tasks. In other words, every computer would have a completely customized set of learning experiences based on specific requirements for that computer. It’s an interesting idea and an amazing technology. Let me know your thoughts about neural processing at John@JohnMuellerBooks.com.

 

An Update on the RunAs Command

It has been a while since I wrote the Simulating Users with the RunAs Command post that describes how to use the RunAs command to perform tasks that the user’s account can’t normally perform. (The basics of using the RunAs command appear in both Administering Windows Server 2008 Server Core and Windows Command-Line Administration Instant Reference.) A number of you have written to tell me that there is a problem with using the RunAs command with built-in commands—those that appear as part of CMD.EXE. For example, when you try the following command:

RunAs /User:Administrator “md \Temp”

you are asked for the Administrator password as normal. After you supply the password, you get two error messages:

RUNAS ERROR: Unable to run – md \Temp
2: The system cannot find the file specified.

In fact, you find that built-in commands as a whole won’t work as anticipated. One way to overcome this problem is to place the commands in a batch file and then run the batch file as an administrator. This solution works fine when you plan to execute the command regularly. However, it’s not optimal when you plan to execute the command just once or twice. In this case, you must execute a copy of the command processor and use it to execute the command as shown here:

RunAs /User:Administrator “cmd /c \”md \Temp””

This command looks pretty convoluted, but it’s straightforward if you take it apart a little at a time. At the heart of everything is the md \Temp part of the command. In order to make this a separate command, you must enclose it in double quotes. Remember to escape the double quote that appears withing the command string by using a backslash (as in \”).

To execute the command processor, you simply type cmd. However, you want the command processor to start, execute the command, and then terminate, so you also add the /c command line switch. The command processor string is also enclosed within double quotes to make it appear as a single command to RunAs.

 

Make sure you use forward slashes and backslashes as needed. Using the wrong slash will make the command fail.

The RunAs command can now proceed as you normally use it. In this case, the command only includes the username. You can also include the password, when necessary. Let me know if you find this workaround helpful at John@JohnMuellerBooks.com.

 

Simulating Users with the RunAs Command

One of the problems with writing applications, administering any network, or understanding system issues is to ensure that you see things from the user’s perspective. It doesn’t matter what your forte might be (programmer, administrator, DBA, manager, or the like), getting the user view of things is essential or your efforts are doomed to failure. Of course, this means seeing what the user sees. Anyone can run an application at the administrator level with good success, but the user level is another story because the user might not have access to resources or rights to perform tasks correctly.

Most knowledgeable users know that you can simulate an administrator by right clicking the application and choosing Run As Administrator from the context menu. In fact, if you Shift+Right Click the application, you’ll see an entry for Run As A Different User on the context menu that allows you to start the application as any user on the system. However, the GUI has limitations, including an inability to use this approach for batch testing of an application. In addition, this approach uses the RunAs command defaults, such as loading the user’s profile, which could cause the application to react differently than it does on the user’s system because it can’t find the resources it needs on your system.

A more practical approach is to use the RunAs command directly to get the job done. You can see some basic coverage of this command on page 480 of Windows Command-Line Administration Instant Reference. To gain a basic appreciation of how the user views things, simply type RunAs /User:UserName Command and press Enter (where UserName is the user’s fully qualified logon name including domain and Command is the command you wish to test). For example, if you want to see how Notepad works for user John, you’d type RunAs /User:John Notepad and press Enter. At this point, the RunAs command will ask for the user’s password. You’ll need to ask the user to enter it for you, but at that point, you can work with the application precisely as the user works with it.

Of course, many commands require that you provide command line arguments. In order to use command line arguments, you must enclose the entire command in double quotes. For example, if you want to open a file named Output.TXT located in the C:\MyDocs folder using Notepad and see it in precisely the same way that the user sees it, you’d type RunAs /User:John “Notepad C:\MyDocs\Output.TXT” and press Enter.

In some cases, you need to test the application using the users credentials, but find that the user’s profile gets in the way. The user’s system probably isn’t set up the same as your system, so you need your profile so that the system can find things on your machine and not on the user’s machine. In this case, you add the /NoProfile command line switch to use your profile. It’s a good idea to try the command with the user’s profile first, just to get things as close as you can to what the user sees. The default is to load the user’s profile, so you don’t have to do anything special to obtain this effect.

An entire group of users might experience a problem with an application. In this case, you don’t necessarily want to test with a particular user’s account, but with a specific trust level. You can see the trust levels setup on your system by typing RunAs /ShowTrustLevels and pressing Enter. To run an application using a trust level, use the /TrustLevel command line switch. For example, to open Output.TXT as a basic user, you’d type RunAs /TrustLevel:0x20000 “Notepad C:\MyDocs\Output.TXT” and press Enter. The basic trust levels are:

 

  • 0x40000 – System
  • 0x30000 – Administrator
  • 0x20000 – Basic User
  • 0x10000 – Untrusted User

Many people are experiencing problems using the /ShowTrustLevels and /TrustLevel command line switches with newer versions of Windows such as Vista and Windows 7. The consensus seems to be that Microsoft has changed things with the introduction of UAC and that you’ll need to work with the new Elevation Power Toys to get the job done. I’d be interested in hearing about people’s experiences. Contact me at John@JohnMuellerBooks.com.