MTBF and Software

Like many people, I sometimes need a bonk on the noggin to remember some essential bit of wisdom that I shouldn’t have forgotten in the first place. Such is the case with the relationship between hardware and software. In many cases, developers have lost their connection with the hardware. Even though it seems quite obvious that the software provides instructions that change the state of the hardware, developers don’t really seem to make the connection. Once you remember the hardware connection, it also begins to make sense that any aberration of the functionality of that hardware will also reflect in the reliability of the software. In short, the Mean Time Between Failures (MTBF) of the hardware also has an effect on the software that runs on the hardware and causes the hardware to perform specific tasks.

The issue that drove the point home for me is a simple hard drive. This particular hard drive came with the system and the vendor used a lower cost drive to keep prices low (normally I get really high quality hardware simply to avoid problems). What this means is that the MTBF of the drive is also quite low. Unfortunately, I encountered the MTBF late last week as a glitch that caused me to think there was a problem with my software. The software was just fine—it was the glitch with the hard drive that was the source of the problem. I only realized this fact after testing the software on another system. (Unfortunately, the hard drive got worse and took some of my system configuration with it, but I maintain backups, so the loss was minimal.)

However, the partial failure of the drive caused me to realize yet again that software can only operate correctly when the underlying hardware also operates correctly. I can’t remember the last time I read anything that even broached the topic of hardware as a potential source of software problems. It makes me think that there are probably developers out there right now trying to find the error in a piece of software that doesn’t even exist in the software, but is a matter of some hardware glitch.

It’s important to realize that hardware doesn’t always fail in a predictable manner either. For example, a glitch can occur when a hairline fracture occurs in the runs of a board. This sort of error makes its appearance when you start the system. When the board heats up, the failure goes away because the breach in the run is sealed. Expansion of the metal fixes the problem. I’ve actually encountered a host of incredibly odd hardware problems over the years, many of which could appear as an isolated software issue given the right circumstances.

The lesson relearned in this case is to always test software on multiple systems. It’s essential that these systems use different components. Doing so will eliminate a number of non-software issues as the source of a problem. For example, using mismatched systems can help you understand when an error is due to a particular device driver. The point is that you need to avoid shooting yourself in the foot by not thinking of all the possibilities. Complex software interacts with the hardware in a complex way, which makes it all the more likely that some insignificant hardware or firmware issue will cause you woe as a developer.

What are your experiences with odd hardware- or firmware-related behaviors? Have you even encountered such behaviors? Let me know at John@JohnMuellerBooks.com.

 

Author: John

John Mueller is a freelance author and technical editor. He has writing in his blood, having produced 99 books and over 600 articles to date. The topics range from networking to artificial intelligence and from database management to heads-down programming. Some of his current books include a Web security book, discussions of how to manage big data using data science, a Windows command -line reference, and a book that shows how to build your own custom PC. His technical editing skills have helped over more than 67 authors refine the content of their manuscripts. John has provided technical editing services to both Data Based Advisor and Coast Compute magazines. He has also contributed articles to magazines such as Software Quality Connection, DevSource, InformIT, SQL Server Professional, Visual C++ Developer, Hard Core Visual Basic, asp.netPRO, Software Test and Performance, and Visual Basic Developer. Be sure to read John’s blog at http://blog.johnmuellerbooks.com/. When John isn’t working at the computer, you can find him outside in the garden, cutting wood, or generally enjoying nature. John also likes making wine and knitting. When not occupied with anything else, he makes glycerin soap and candles, which comes in handy for gift baskets. You can reach John on the Internet at John@JohnMuellerBooks.com. John is also setting up a website at http://www.johnmuellerbooks.com/. Feel free to take a look and make suggestions on how he can improve it.