Security Breaches and the Potential Effect on Big Data

There are two interacting forces in big data today that few people are talking about. Perhaps it just hasn’t occurred to anyone that there truly is a serious threat. This particular post is going to talk about big data used for healthcare, but the same issue applies to any use of big data. Organizations, such as Penn Medicine, are using big data to perform real world tasks that really make difference. For example, it’s now possible to predict the potential for diseases well in advance of any critical fallout now—at least for some diseases such as sepsis. The ability to predict an event before it becomes critical is important for all sorts of reasons, but the most important is improving overall health. Of course, it also affects the cost of healthcare and the need to use healthcare in the first place.

However, while writing both Python for Data Science for Dummies and Machine Learning for Dummies, I’ve discovered the fallout of data errors is more critical than anyone can imagine. Ensuring correct data entry is a large part of the solution, but there are other concerns. Yes, algorithms can learn to determine which data is useful and which data isn’t, but the purer the data at the outset, the better.

While writing Security for Web Developers I reviewed many sorts of security breach, some of which involve modifying organizational data. What this means is that an outsider could potentially corrupt the big data used to make assumptions about medical conditions. Do you see where I’m going with this? Having bad data, data that is modified by an outsider and therefore not as likely to gain the attention of someone who can fix it, will cause those algorithms to make some invalid assumptions. Humans help correct the assumptions, but humans aren’t perfect and make assumptions about the behavior of the algorithm. The bottom line is that security breaches of the wrong sort could end up costing lives. It’s something to think about anyway.

The potential for error in big data analysis is just one of a whole bunch of reasons that I’m happy to read that the government is finally looking into ways to bolster the devices used to work with medical data. I’m almost positive that medical practitioners will fight tooth and nail against the new security measures, just like users of every persuasion do, but the security measures really are more important than just protecting individual patient data. As data becomes the centerpiece of all sorts of human endeavors, ensuring it remains as pristine as possible becomes ever more important. Security has to take a bigger role in data management in the future. Let me know your thoughts on securing data that could be used for medical analysis at