Index Home About Blog
Date: 	Thu, 6 Jul 2000 08:00:46 -0400
From: "Theodore Y. Ts'o" <tytso@MIT.EDU>
Subject: Re: ext3-0.0.2e released
Newsgroups: fa.linux.kernel

   Date: 	Thu, 6 Jul 2000 12:37:34 +0200
   From: Ookhoi <ookhoi@dds.nl>

   > In article <cistron.20000706093557.F13402@ookhoi.dds.nl>,
   > Ookhoi  <ookhoi@dds.nl> wrote:
   > >Why not use Reiserfs? I think journaling is a must with disks that
   > >large, and reiserfs is faster dan ext2 if you have a lot of files/dirs
   > >in a directory. 
   > 
   > Ehrm. Look at the subject of this thread. What do you think "ext3" is ;)

   Yeah, I know, but then again, why do they all want to fsck? We have so
   many nice journaling file systems now.

1)  Ext3 is a journaling filesystem.

2)  Even with journaling filesystems, there will be cases you will need
	to run some kind of filesystem consistency checker.  

	a)  In case of disk drive problems.
	b)  In case of memory problems (particularly cache memory)
	c)  In case of kernel bugs (many times what people think of as 
		"bugs" in filesystem code is really bugs in the VM or
		buffer cache parts of the kernel.)

    This is true for all journaling filesystems; they aren't magic.
    What journaling filesystems do protect you against is the need to
    run fsck in case of an power failure or a kernel crash, or some
    other kind of unclean shutdown (so long as that unclean shutdown
    doesn't cause any other forms of on-disk corruption.)

3)  For example, I have seen reports that sound like there are
    problematic hard disk subsystems out there that will write garbage
    out to disk during an unclean shutdown.  (i.e., as the voltage on
    the power rail starts going down, the memory is the most sensitive
    to voltage drops, and fails first by writing garbage out to the
    memory bus.  Meanwhile, the hard drive controller continues to
    happily DMA'ing said garbage to the disk.)  

    The cases where I've seen this is reports of major parts of the
    inode table getting trashed in ext2 when all that happened was an
    unclean shutdown.  I've gotten enough reports of this out there that
    I'm pretty sure it's not a fluke, and there are systems out there
    that have this unfortunate characteristic if the disk is being
    written when someone unkindly hits the Big Red Button.

    If your hardware is doing nasty things like this, no amount of
    journaling filesystem is going to save you.

Don't get me wrong; journaling filesystems are good stuff, and they're
definitely something that we'd should have.  It's just that there are a
lot of people who seem to be ascribing almost mythic powers to
journaling filesystems, and that's a bad thing.  Journaling is a
technology, nothing more and nothing less.

						- Ted


Date: 	Thu, 6 Jul 2000 14:32:35 -0400
From: "Theodore Y. Ts'o" <tytso@MIT.EDU>
Subject: Re: ext3-0.0.2e released
Newsgroups: fa.linux.kernel

   Date: Thu, 06 Jul 2000 22:56:31 +1000
   From: Andrew Morton <andrewm@uow.edu.au>

   > 
   > 2)  Even with journaling filesystems, there will be cases you will need
   >         to run some kind of filesystem consistency checker.

   Can this be done while the fs is online?

Multics operating system was able to run its filesystem recovery tool
while the filesystem was online.  Then again, Multics was also designed
so that if a circuit breaker snapped off and one of its three memory
cabinets got uncleanly shutdown, only processes that had memory pages on
the downed memory subsystem would get killed.  The thinking was: just
because you lost 1/3 of your memory and you have to kill off 22 user's
processes, why should you have to ruin the the other 45 users's day?
:-)

In practice, though, I'm not aware of any filesystem consistency checker
since the days of Multics that could do this.  It's possible, but you
have to put all sorts of very careful interlocking between the checking
code and the filesystem code, and this adds a *lot* of complexity.  In
the case of multics, the filesystem consistency checker was actually
part of the kernel (it ran in Ring 0), and this tends to go against the
general Unix and Linux design principles of keeping as much as possible
in userspace.

						- Ted


Date: 	Thu, 6 Jul 2000 16:04:10 -0400
From: "Theodore Y. Ts'o" <tytso@MIT.EDU>
Subject: Re: ext3-0.0.2e released
Newsgroups: fa.linux.kernel

   Date: Thu, 6 Jul 2000 15:12:22 -0400
   From: Nick Cabatoff <ncc@cs.mcgill.ca>

   There's one for UFS/FFS now on the way in FreeBSD 5.0:  Kirk McKusick
   just released alpha code to do what he calls snapshots, which I'm told
   will enable background fscking, among other things.  See
   http://people.freebsd.org/~mckusick/snap.tgz (or the freebsd-arch
   archives) if you're curious.

That's not a full filesystem consistency checker, though.  He's running
fsck on a consistent snapshot of the filesystem in order to detect
orphaned blocks which can then be freed in the live filesystem.  (The
BSD soft update code can leak blocks from inodes which are open at the
time of a system crash, which is why this is necessary.)

This technique can't be used to deal with arbitrary filesystem
corruption, however.  It only addresses a very specific case which can't
be handled any other way given the BSD Soft Updates approach.

						- Ted



Index Home About Blog