I like this idea for some applications, though a window manager is not
one of them.
I have unknowingly implemented this concept in a render farm I manage.
The farm does continuous transcoding of media submitted by any user on
the web. Therefore processes crash often. Then they restart and move
right along.
-lee
On Tue, Feb 3, 2009 at 4:33 PM, Marcin Cieslak <saper_AT_system.pl> wrote:
> markus schnalke wrote:
>
>> This is just a thought, because I stumpled upon the concept and think
>> it's a quite interesting approach.
>>
>> See: http://en.wikipedia.org/wiki/Crash-only_software
>
> I don't like this approach. I have always preferred software that "fails
> fast". As soon as something is wrong - just abort with debugging information
> what went wrong.
>
> I see some issues with the approach described in the paper. It assumes that
> the state saved is okay - I think that crashes occur _because_ internal
> state is inconsistent or wrong. Sure, you can dump internal state regularly
> for recovery - but it's like with backups - you never know which one is
> really clean and okay until you try to restore.
>
> Software bugs will sometimes create incorrect data. This may go unnoticed
> for some longer time.
>
> I think that authors unnecessarily assume that software components are
> "black boxes" that need to be kept up at all costs. This is not the right
> approach for availability I think. Most issues will occur when the component
> is upgraded and needs to use/migrate old data or sometimes to cooperate with
> still not upgraded components. If something goes wrong, the rollback becomes
> the issue also - if I have new, badly-behaving components that dumped its
> state in a new format, how do I go back?
>
> Sweeping problems under the carpet is not going to help much...
>
> --Marcin
>
>
>
Received on Tue Feb 03 2009 - 22:36:41 UTC
This archive was generated by hypermail 2.2.0 : Tue Feb 03 2009 - 22:48:05 UTC