Re: [dwm] Re: Crash-only software from David Tweed on 2009-02-05 (dwm mail list archive)

From: David Tweed <david.tweed_AT_gmail.com>
Date: Thu, 5 Feb 2009 16:53:24 +0000

On Tue, Feb 3, 2009 at 9:33 PM, Marcin Cieslak <saper_AT_system.pl> wrote:
> I don't like this approach. I have always preferred software that "fails
> fast". As soon as something is wrong - just abort with debugging information
> what went wrong.

I don't think fails fast is incompatible with crash only software but
rather that updating persistent storage should not be this big
monlithic operation that should only be called regularly.

> I see some issues with the approach described in the paper. It assumes that
> the state saved is okay - I think that crashes occur _because_ internal
> state is inconsistent or wrong. Sure, you can dump internal state regularly
> for recovery - but it's like with backups - you never know which one is
> really clean and okay until you try to restore.

> I think that authors unnecessarily assume that software components are
> "black boxes" that need to be kept up at all costs. This is not the right

My reading was more to try and avoid the usual software development
"tendency" that developers really don't like to think about things
going wrong, so they spend time on code that feels "positive" like
save routines, etc, and do as little stress testing of things as they
can, and certainly with no regard to the users data when a programming
error manifests. In contrast, if you're focused on making things
robust in the case of a crash, you are actually forced to think about
what can go wrong and how to ameliorate it. I tend to see this as most
appropriate for applications dealing with transient data, eg, editors,
user-modified-website stuff, etc, where you don't want to have
guaranteed prisine data back to the beginning of time but where having
the last day's modifications recoverable (possibly with some risk of
corruption) is preferrable to a program essentially saying "I've
crashed. Your recent data's gone. Deal with it. Here's a core dump for
the developer though."

> Sweeping problems under the carpet is not going to help much...

I agree, and I don't think it's remotely appropriate for most software
but it seems useful for niche applications to be concentrating on
dealing with the dust (problems) rather than maintaining that in the
next release there will never be any more dust generated (it'll be bug
free).

-- 
cheers, dave tweed__________________________
computer vision reasearcher: david.tweed_AT_gmail.com
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot

Received on Thu Feb 05 2009 - 16:53:24 UTC

This archive was generated by hypermail 2.2.0 : Thu Feb 05 2009 - 17:00:06 UTC