Re: [dev] JFS filesystem

From: Martin Tournoij <martin_AT_arp242.net>
Date: Mon, 22 Apr 2019 02:03:30 +1200

On Sun, 21 Apr 2019 14:21:27 +0100 Joseph Graham <joseph_AT_xylon.me.uk> wrote:
> > In fact, in many filesystems there are very weak – or no! – guarantees that
> > the data you're reading is actually correct. Systems like ext4 simply assume
> > that the data written to the disk will never change. AFAIK, it has
> > essentially no mechanism at all to deal with silent data corruption.
>
> It's not fair to say there's "no mechanism at all to deal with silent
> data corruption". The hard-disk/ssd does checksum every block. If a block
> fails a checksum the disk keeps trying until it reads a block that
> matches the checksum, else gives up with a read-error.
>
> So really it's a matter of whether you trust your drives to do their
> job correctly.

Unfortunately it's not that simple; from [1]:

> Finding (1): In addition to disk failures (20-55%),physical interconnect
> failures make up a significant part (27-68%) of storage subsystem
> failures. Protocol failures and performance failures both make up
> noticeable fractions.
>
> Implications: Disk failures are not always a dominant factor of storage
> subsystem failures, and a reliability study for storage subsystems cannot
> only focus on disk failures.Resilient mechanisms should target all failure
> types.

The Annualized failure rate for these kind of silent errors is about 3-4%.
That's pretty high!

Also from the ZFS authors[2]:

> - Wrote a simple application to write/verify 1GB file
> - Write 1MB, sleep 1 second, etc. until 1GB has been written
> - Read 1MB, verify, sleep 1 second, etc.
> - Ran on 3000 rack servers with HW RAID card
> - After 3 weeks, found 152 instances of silent data corruption
> - Previously thought “everything was fine”
> - HW RAID only detected “noisy” data errors
> - Need end-to-end verification to catch silent data corruption

There's much more research on this; this is just the first I found.

This is the reason that pretty much all these newer filesystems have
checksums.


[1]: https://www.usenix.org/legacy/event/fast08/tech/full_papers/jiang/jiang.pdf
[2]: https://www.snia.org/sites/default/orig/sdc_archives/2008_presentations/monday/JeffBonwick-BillMoore_ZFS.pdf

Received on Sun Apr 21 2019 - 16:03:30 CEST

This archive was generated by hypermail 2.3.0 : Sun Apr 21 2019 - 16:12:07 CEST