June 16, 2004

Software Reliability (2 years later)

Posted at June 16, 2004 03:13 PM in BitWise , Technology .

In a nod to my previous blogging life (back before blogs were "cool"), here's an updated version of the last post I made some two years ago. Enjoy.

Of all the stories you don't want to hear, this is one of them, from the August 2002 Wired Magazine:


Ed Yourdon was on a tarmac in Pittsburgh when he got a glimpse of the coming software hell. His New York shuttle had been cleared for takeoff when the pilot pulled a U-turn and headed back to the gate. The flaps were stuck. "We're going to have to power down and reboot," the pilot announced. It was the aeronautical equivalent of a Ctrl-Alt-Delete.

I have only one question: what if the plane were to need a reboot at 10,000 feet?

Bugs have become more and more evident as software evolves. Software is more quirky, buggier, and more crash-prone with each passing year (although operating systems have fortunately been bucking this trend lately). What happens when we eventually can't get anything done because we are constantly fighting the technology?

I think back to the "good old days," when software largely worked, and software "patches" were the next major version. Just because it is possible to have users download patches 10 days after release doesn't mean that we should. Are post-release patches really any cheaper for a company? What is the cost of an unhappy user base? If I could buy a product and know that it would work, I would be loyal forever to the company that produced it.

Some point fingers of blame at the programmers, others point at programming practices, others point at management, but I point somewhere else. Only the consumer is to blame. We don't tolerate such unreliability anywhere else. Every time we use or buy bad software, we are expressly endorsing the status quo. Once consumers start voting with their dollars and demanding software that works, it will suddenly be worth it to software companies to produce reliable software.

When Jon and I first envisioned BitWise Chat over two years ago, we wanted to create an instant messenger that would be reliable. We had experienced too many quirks over the years, such as profiles not retrieving on AIM, offline messages getting duplicated on ICQ (sometimes months later), connectivity problems on Trillian, crashes with Yahoo, etc. I was ecstatic when BitWise 0.2.1.2 was used for four months in 2003 with not a single bug report. We didn't get there overnight, and while it probably did have one tiny quirk, it held up to four months of solid use by thousands of people worldwide. That reliability goal continues to drive us today, and with the upcoming BitWise 1.0.2, we're looking for a repeat: a rock-solid version to endure the next round of (r)evolutionary changes in BitWise. It is possible to release stable, reliable software, but it takes, above anything else, the dedication to be receptive to feedback and willingness to track down the "miniscule" bugs. Frequently those small bugs can cause larger, unforeseen problems.

Hats off to anyone who participates in software testing. For BitWise, at least, you help ensure that each release works as it should, like any software should. It may not work for everyone, but for us, the secret is in the public testing.

Comments

I personally believe that, in non-mission critical apps, a few bugs are fine. I understand that the company can price the product at $99 if they get 99% of the bugs out, but would have to devote much time and energy to finding that last 1%. As long as my life, or my life's work, does not depend on the program, I'm willing to accept a cheaper cost for the risk of crashes.

DM

Posted by David Miller at June 17, 2004 01:13 AM
Posting of new comments has been disabled for this post.