Index Home About Blog
Subject: Re: ANNOUNCE: Linux/PowerPC Kernel
From: torvalds@cc.Helsinki.FI (Linus Torvalds)
Date: Aug 03 1995
Newsgroups: comp.sys.powerpc,comp.os.linux.development.system

In article <3vnorm$>,
Greg Hudson <> wrote:
>Larry McVoy (lm@neteng) wrote:
>: 	2.  Do this.  Turn off the sync meta update in FFS.  Untar a
>: 	big directory _into_ the file system and power off the machine
>: 	in the middle.  Now do the same with Linux.  Please run fsck
>: 	under script and post the outputs.  That's what conviced me that 
>: 	Linux was better.  Go do it and report back to us.
>I'm willing to believe that the FFSfilesystem comes out worse than
>the Linux filesystem, but what does that prove?  You shouldn't be
>turning off synchronous meta-data updates in your filesystem.  (It
>might be enough of a performance boost in a news spool that it will
>save some administrators some money, but this is explictly a mode
>where reliability is NOT a design goal.)  Last I checked, under normal
>conditions Linux ext2 is not as careful as FFS about keeping the
>filesystem consistent during writes, so a spontaneous reboot is more
>likely to damage a Linux filesystem than a NetBSD filesystem.  This is
>certainly my experience in practice.

I've said this before, and I guess I'll say this again.

	BSD "synchronous" filesystem updates are braindamaged.

	BSD people touting it as a feature are WRONG. It's a bug.

Synchronous meta-data updates are STUPID:

 (a) it's bad for performance
 (b) it's bad for filesystem stability

(a) is obvious, and even BSD people will agree to that.  But (b) is not
as obvious, and BSD people mostly say "Huh?"

In short, updating meta-data synchronously almost guarantees that the
filesystem structure will be up-to-date after a crash, but it will _not_
guarantee that the actual file data will be up-to-date.  In fact, it
will often result in a filesystem that "fsck" thinks is perfectly ok,
_despite_ the fact that you have corruption. 

In fact, the way to get a stable filesystem is to do the updates exactly
reverse to the way BSD does it: write out the data blocks first, _then_
write out the meta-data.  The problem with this approach is you end up
with a partial ordering in which to write the data, and ordering it
isn't trivial. 

Doing synchronous meta-data updates is a cludge to make fsck not
complain as much about corrupted filesystems.  It doesn't fix the
problem, it only fixes some of the symptoms.  Touting that as a Good
Thing (tm) is idiocy, IMNSHO (you'll feel safe because fsck doesn't tell
you anything is wrong). 

What makes the BSD approach even more stupid is the fact that the
meta-data inconsistencies are the one thing fsck _can_ fix, so trying to
keep meta-data up-to-date is in some respect a complete waste of time. 

It's much better to instead concentrate on making a better fsck, as fsck
is run only once at bootup (and often not even then as most bootups will
be from a clean filesystem) than to take the performance hit at
run-time.  That's the approach the linux filesystems take (well, at
least the ext2fs filesystem: most other filesystems have a rather stupid
version of fsck). 

Of course, if filesystem integrity is important for you, you don't want
to use the linux ext2fs.  That isn't what I'm trying to claim.  What I'm
saying is that ffs isn't really better in this regard.  If you want
filesystem consistency, you have to use some kind of journalling

Alternately you can make a unix-type filesystem and do the disk updates
the _right_ way: data blocks first, then indirect blocks (starting from
the lowest level indirected blocks), then the inode, and finally the
directory entry (and going in the opposite direction when you're
deleting a file).  Note that you don't need to do any of these updates
synchronously: you only have to make them in the right order. 

The sad thing is that the FFS approach is not just _wrong_, it's also
slower then the right way (the partial ordering will still allow quite a
lot of re-ordering among non-related updates, so you probably can get
reasonably close to the completely asynchronous case).  Linux doesn't do
it right either, but at least linux doesn't take the performance hit for
no gain. 


Index Home About Blog