Machine Learning Security and Event Sourcing for Databases

In times past, an application would make an update to a database, essentially overwriting the old data with new data. There wasn’t an actual time element to the update. The data would simply change. This approach to database management worked fine as long as the database was on a local system or even a network owned by an organization. However, as technology has progressed to use functionality like machine learning to perform analysis and microservices to make applications more scalable and reliable, the need for some method of reconstructing events has become more important.

To guarantee atomicity, consistency, isolation, and durability (ACID) in database transactions, products that rely on SQL Server use a transaction log to ensure data integrity. In the event of an error or outage, it’s possible to use the transaction log to rebuild pending database operations or roll them back as needed. It’s possible to recreate the data in the database, but the final result is still a static state. Transaction logs are a good start, but not all database management systems (DBMS) support them. In addition, transaction logs focus solely on the data and its management.

In a machine learning security environment, of the type described in Machine Learning Security Principles, this isn’t enough to perform analysis of sufficient depth to locate hacker activity patterns in many cases. The transaction logs would need to be somehow combined with other logs, such as those that track RESTful interaction with the associated application. The complexity of combining the various data sources would prove daunting to most professionals because of the need to perform data translations between logs. In addition, the process would prove time consuming enough that the result of any analysis wouldn’t be available in a timely manner (in time to stop the hacker).

Event sourcing, of the type that many professionals now advocate for microservice architectures, offers a better solution that it less prone to problems when it comes to security. In this case, instead of just tracking the data changes, the logs would reflect application state. By following the progression of past events, it’s possible to derive the current application state and its associated data. As mentioned in my book, hackers tend to follow patterns in application interaction and usage that fall outside the usual user patterns because the hacker is looking for a way into the application in order to carry out various tasks that likely have nothing to do with ordinary usage.

A critical difference between event sourcing and other transaction logging solutions is the event sourcing relies on its own journal, rather than using the DBMS transaction log, making it possible to provide additional security for this data and reducing the potential for hacker changes to cover up nefarious acts. There are most definitely tradeoffs between techniques such as Change Data Capture (CDC) and event sourcing that need to be considered, but from a security perspective, event sourcing is superior. As with anything, there are pros and cons to using event sourcing, the most important of which from a security perspective is that event sourcing is both harder to implement and more complex. Many developers cite the need to maintain two transaction logs as a major reason to avoid event sourcing. These issues mean that it’s important to test the solution fully before delivering it as a production system.

If you’re looking to create a machine learning-based security monitoring solution for your application that doesn’t require combining data from multiple sources to obtain a good security picture, then event sourcing is a good solution to your problem. It allows you to obtain a complete picture of the entire domain’s activities that helps you locate and understand hacker activity. Most importantly, because the data resides in a single dedicated log that’s easy to secure, the actual analysis process is less complex and you can produce a result in a timely manner. The tradeoff is that you’ll spend more time putting such a system together. Let me know your thoughts about event sourcing as part of security solution at [email protected].