markus schnalke wrote:
> This is just a thought, because I stumpled upon the concept and think
> it's a quite interesting approach.
>
> See: http://en.wikipedia.org/wiki/Crash-only_software
I don't like this approach. I have always preferred software that "fails
fast". As soon as something is wrong - just abort with debugging
information what went wrong.
I see some issues with the approach described in the paper. It assumes
that the state saved is okay - I think that crashes occur _because_
internal state is inconsistent or wrong. Sure, you can dump internal
state regularly for recovery - but it's like with backups - you never
know which one is really clean and okay until you try to restore.
Software bugs will sometimes create incorrect data. This may go
unnoticed for some longer time.
I think that authors unnecessarily assume that software components are
"black boxes" that need to be kept up at all costs. This is not the
right approach for availability I think. Most issues will occur when the
component is upgraded and needs to use/migrate old data or sometimes to
cooperate with still not upgraded components. If something goes wrong,
the rollback becomes the issue also - if I have new, badly-behaving
components that dumped its state in a new format, how do I go back?
Sweeping problems under the carpet is not going to help much...
--Marcin
Received on Tue Feb 03 2009 - 21:33:31 UTC
This archive was generated by hypermail 2.2.0 : Tue Feb 03 2009 - 21:36:04 UTC