Daily Archives: January 25, 2007

Uncategorized

Software reliability – defining the problem

Computerworld – Tanenbaum outlines his vision for a grandma-proof OS

Tanenbaum wants to mainstream a new metric: LFs (lifetime failures).

Though I generally agree with Tanenbaum about software reliability, and that it is important that we (as an industry) make progress in that area, this quote struck me:

When consumers go to buy an electrical appliance such as a TV or stereo they expect to bring it home, plug it in and see it work. And it is exactly what happens — for years on end. But not so with computers, even though it should, says Tanenbaum…

Uh…I run OpenBSD, Linux (well, ok, not right now, but that’s actually a deviation from my norm), and WindowsXP on my various assemblies of cheap PC hardware (all built myself – no vendor support for me! Cheaper components with shorter MTBFs! Yay!) – and I’m not sure I’ve ever experienced a WindowsXP BSOD (ok, it’s more like the reboot-of-death now, if you have the default settings intact) since leaving Microsoft. That’s nearly 6 years without a [catastrophic] failure of the OS. I’ve had a few kernel panics in Linux and OpenBSD, but that’s because I was hacking. Neither have failed for me under normal usage scenarios.

The quote struck me because my electrical service goes out far more frequently than that.

I’m just sayin’.

Update:
Tanenbaum’s argument serves as an interesting corollary to my previous post about why we don’t make software better (probably the reverse, actually).

Uncategorized

Making software better

I’ve now been at work for 15 hours since I last slept, most of which has been devoted to tracking down a problem that makes me want to rip my hair out and a large chunk of which I won’t get paid for.

However, that’s not a big deal :-)

Along the way, I needed to look at a lot of ‘printf’ output. Normally, our software doesn’t spew a lot of debug information. We print a certain amount my default because it allows our customer to self-diagnose to a degree – but if we print too much, the customer freaks out and that makes more work for us when a vast majority of all of our output is purely informational. Even some messages that say things like “ERROR! THE WORLD WILL EXPLODE IN 10 SECONDS” can safely be ignored, so we keep most of our debug output off.

What that means to me tonight (this morning) is that I need to turn it on. Ok, great. There’s one line in a config file somewhere, right? Or maybe just one runtime “knob” (that reminds me, I should write a post about knobs) that can be turned? Heh. Wrong.
read more »