20 January 2016

Crashiness

I discovered a new metric for  software stability.  A set of programs were described by a blog writer as "crashy."  While not well defined, this is a useful metric when comparing applications that do similar things.  This is a bit of a tangent of what I want to discuss though.

I have found that most computer users have a poor understanding of software stability.  I suppose this is to be expected, as most computer users do not even understand the technology at a basic level, but there are some things that people should know when they are using computers. Unfortunately, these things are not really taught anywhere.  I am going to fix that in this article.

First, stability is a measure of how well a program handles various situations.  A program that has errors or crashes when given typical expected input is very unstable.  A program that has errors or crashes on unusual input is somewhat unstable.  A program that has errors or crashes in only a few very rare cases is considered fairly stable.  A program that never crashes is considered highly stable.  In industry, programs go through phases of development, and often one factor in determining when to advance to the next phase is stability.  Different companies tend to have different standards on what is stable enough to sell.  This is why some companies have a reputation for very stable software while others have reputations for poor stability.

Second, instability is typically caused by programmer error.  There are some errors in programs that cannot be avoided, however if an error can be predicted, the program can handle it without crashing.  Stable programs do not crash when they encounter an error.  Instead they either work around it, or if that is not possible, they notify the user and give the user options for how to respond.  There are only a few extremely rare cases where an error cannot be predicted (hardware issues, like bad memory or cosmic radiation changing memory data).  In short, instability exists because it was programmed in.  Not to imply that it was deliberate, because it is almost never intentional, but when a program crashes, it is the fault of a programmer somewhere for not handling the error appropriately.  In other words, when your applications crash, it is not your fault!  I have to say this because I frequently hear people say things like, "I should have known it would do that, because it always crashes when I do this thing," as if the crash was their fault.  The user should not have to memorize a bunch of cases that crash the program to avoid crashes.  The programmer should have taken care of that in the first place.

Now, this is kind of a hard nosed approach.  Something important to keep in mind is that it is nearly impossible to create large applications without any bugs.  Software development went way beyond the level of human comprehension decades ago.  When a group makes a large application, it is divided into parts, because no single human can completely understand the entire application all at once.  Each group works on a part of the application, and they are given information on how their part should work together with all of the other parts.  Sticking to best practices minimizes any clashing that could occur between parts, but since no person can fully understand all of the parts, there are always holes.  These holes, where unexpected interaction (called "side effects" in some types of programming) between parts of a program occur can result in bugs that are extremely difficult to find.  For large software companies, what it comes down to is, "How much money are we willing to spend on debugging before we start selling the software?"  Cheaper companies will try to maximize profits by reducing debugging spending.  Higher end companies will try to maximize quality by putting a bit more into debugging.  This affects product price though, and in most cases, spending the time and money to eliminate all bugs would be so expensive that no one could afford to buy the software, and it would take so long it would be obsolete by the time it was ready anyway.  In short, unless you are paying millions or billions of dollars on software that is years or even decades behind its time, you should not expect it to be bug free.

The question then is, what is reasonable to expect?  The answer is that it depends on criticality.  Start with the question, "What do I lose if the software crashes?"  The higher the loss, the higher the criticality.  For example, if your word processor crashes, you could lose hours of work.  If your word processor has a decent autosave feature, you could lose minutes or seconds of work.  The word processor without autosave has higher criticality than the one with.  If your online video game client crashes, you lose a few seconds or minutes of leisure time to restart it, which is hardly critical at all.  If your game is not online and it does not have autosave, it is more critical, but it is still just leisure time that you are losing, so it is less critical than the word processor.  On the other hand, your operating system is extremely critical.  This is because everything else depends on it.  Your word processor could be 100% bug free, but an unstable operating system can cause it to crash anyway, and if your operating system itself crashes, you lose everything that has not been saved, regardless of the stability of the applications.  On any system, the operating system is always the most critical piece of software, because everything else depends on it.

So, next time someone tells you that a piece of software is crashy, but it still does what you need, consider the cost of the crashes.  You might find that you are ultimately more productive with a less popular product with fewer features, because you lose less work less often and spend less time waiting for to program to start back up after a  crash.

No comments:

Post a Comment