Block device error handling (Theodore Ts'o)

Index Home About Blog

From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: [Ext2-devel] Re: [RFD] FS behavior (I/O failure) in kernel summit
Date: Wed, 15 Jun 2005 14:03:44 UTC
Message-ID: <fa.e9cbbh1.lms1q5@ifi.uio.no>
Original-Message-ID: <20050615140105.GE4228@thunk.org>

On Tue, Jun 14, 2005 at 11:46:36AM +0900, Kenichi Okuyama wrote:
> I agree that kernel can not directly influence user.
> But, application may have better chance.
>
> Think about case of editor (vi, emacs, almost any text editors are ok ).
>
> If you try to save file, and recieve no error, user will believe they
> have been written on disk they believe to be existing.
> Even log yells for error, user will not notice.
>
> If editor recieve error, then user can know something is wrong. Though
> he is still wondering, if he recieve the message
> like "Input Output Error: may be HW error?", he definitely will start
> from looking at cable.

Kenichi-San,

Part of the problem is that we are limited by the constraints of the
POSIX specification for error handling.  For example, we don't have a
way of telling the application, "the reason why you the filesystem was
remounted-read-only was in reaction to an I/O error that appears to be
caused by the multiple CRC checksum errors reported by the SCSI
controller".  We can only return EIO or EROFS.  And while the write()
which causes an I/O error that remounts the filesystem read/only can
(and probably does) return EIO, any subsequent writes will return
EROFS, and changing this would be hard, hackish, and probably wouldn't
be accepted.

Also, there is not neccesarily one right answer to how to respond to a
underlying I/O error in the filesystem.  So for ext2/3 filesystem, it
is configurable.  In case of an underlying error detected in the
filesystem metadata, the filesystem can be set to either (a) panic and
force a reboot, so that hopefully fsck can resolve the issue, (b)
remount the filesystem read/only, to prevent further damage, or (c)
continue and do nothing (the don't worry, be happy approach).
Different users will want different approaches, and so trying to
standardize what applications will see at the user level doesn't seem
like the right approach, since we want to allow system administrators
some flexibility about how they wish to configure their systems.

(For example, an embedded system or a system where there is higher
levels of redundancy, the right answer might be to panic and either
reboot or halt --- continuing and possibly returning wrong answers
might be completely unacceptable, and it may be that the once the
system goes down hard, the adjacent backup blade can pick up
operations.)

So instead of trying to standardize the existing error returns, which
are they way they are and for which trying to standardize them would
probably be not worth the effort, since they don't return enough
context to the application anyway ---- I would suggest the better
thing to do is to design a new mechanism for returning block device
errors via either some kind of notifcation mechanism (pick your choice
of hotplug, dbus, or netlink --- dbus may make the most amount of
sense, since multiple applications may want to subscribe to such
notifications) of problems at the filesystem level, so that
applications can take corrective action as necessary.

This is a better approach, since it far more flexible and returns much
more information to the user.  For example, in a desktop environment,
the desktop can pop up a warning dialog to the user of a failure of a
block device or filesystem corruption, without having to modify every
single application.  In the case of an embedded system, the
notification can trigger an appropriate failover or recovery process.

Regards,

						- Ted

From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: [Ext2-devel] Re: [RFD] FS behavior (I/O failure) in kernel summit
Date: Wed, 15 Jun 2005 20:45:17 UTC
Message-ID: <fa.e8bvdh7.gmo0a3@ifi.uio.no>
Original-Message-ID: <20050615203750.GC7722@thunk.org>

On Thu, Jun 16, 2005 at 04:40:45AM +0900, Kenichi Okuyama wrote:
> Ted> And while the write()
> Ted> which causes an I/O error that remounts the filesystem read/only can
> Ted> (and probably does) return EIO
>
> No. they return EROFS from beginning.
>

No, trust me, the *first* read/write to a device which is returning
errors is returning EIO.  But it might not be the application which
you are testing.  It might be an attempt to update the inode last
access time that fails, so it might not even be returned to user space
at all.

But once the filesystem is remounted read-only the reason why EROFS is
being returned is not because of an I/O error, but because the
filesystem is now read-only.  It makes perfect sense, if you think
like a computer....

> The point is pretty easy ( I think ).
>
> Q1.  Why does file system succeed in re-mounting as r/o, when device
>      is totally lost?

That's because right now there is no way for block devices to inform
the filesystem that device is totally gone.

> But in case of Mr. Qu's test, device is lost. USB cabel is
> unplugged. They are unreachable. How could such device be *MOUNTED*?
> # In other word, why can't I mount device which does not exist,
> # while I can re-mount them?

Because remounting a filesystem means toggling the in-core data
structures that writes are no longer being tolerated.  It doesn't
require reading from the device, which a fresh mount requires.

> 1) devices and file systems are still under control of kernel.
> 2) devices or file systems are not under control of kernel anymore.
>
> I do agree that, for devices, it is device driver's responsibility
> to identify which type of error have arised. But when file system
> recieved type 2 error, he should not change it to type 1 error
> ( unless fs could really guarantee that ).
>
> And, therefore, for type 2, I belive they can be standardize, and I
> think we should.

The problem is the filesystem right now can't tell the difference
between type 1 and type 2 errors.  All we know is that an attempt to
read or write from a block as failed.  We don't know why it failed.

I agree that *if* the filesystem could be told that a block device has
disappeared, then we should do the equivalent of umount -l on the
filesystem, and revoke all open file descriptors, much like the BSD
revoke(2) system call.

But this isn't matter of "standardizing" error returns, but rather a
feature/enhancement request.

						- Ted

From: Theodore Ts'o <tytso@mit.edu>
Newsgroups: fa.linux.kernel
Subject: Re: [Ext2-devel] Re: [RFD] FS behavior (I/O failure) in kernel summit
Date: Wed, 15 Jun 2005 23:02:50 UTC
Message-ID: <fa.e7c9dh9.gm61ad@ifi.uio.no>
Original-Message-ID: <20050615225322.GB8584@thunk.org>

On Wed, Jun 15, 2005 at 01:38:59PM -0700, Hans Reiser wrote:
> Ted, if I understand you correctly, I agree with you.  ;-)
>
> What users need is for a window to pop up saying "the usb drive is
> turned off" or "we are getting checksum errors from XXX, this may
> indicate hardware problems that require your attention".

Yes, and as I suggested, this is best done via out-of-band
notification system, such as hotplug or dbus.

> Now that GUIs exist, and now that more errors are possible because the
> kernel is more complex, perhaps kernel error handling should be
> reconsidered.  I don't have the feeling that anyone has felt themselves
> authorized to take a deep look at how this ought to be designed.  I mean
> sure, there are sometimes console windows that things get printed into,
> but unsophisticated users basically want to be prompted if something is
> wrong that needs their attention and to not have their experience
> cluttered by a console window otherwise.  Also, it has long been
> irritating having to make error codes conform to one of the existing
> error codes when there is often no good connection between the name of
> an existing error code and the new error condition one has just coded,
> and there is no space left for new error codes.

We could try to add some complicated exception system into system
calls, but it's not productive in my opnion.  First of all, backwards
compatibility is an absolute and unconditional requirement (we can't
break POSIX compatibility, and more importantly, we don't want to
change the number of applications that Linux can run from being
Linux-like to being BeOS-like).  This adds enough of a constraint that
I doubt trying to add changes to the system call error handling
mechanism is likely to work well.

Secondly, if the goal is to have a pop-up show when there is some
major hardware problem, changing the system call error handling
doesn't really help us unless we want to require every single
application in existence to be modified to use this new exception
handling system.  Having seen how well this BeOS-like approach has
worked for BeOS, I believe this is a Really Bad Idea.  It's better to
have a separate, out-of-band notification scheme --- it's what dbus is
really designed to be for.

> >Also, there is not neccesarily one right answer to how to respond to a
> >underlying I/O error in the filesystem.  So for ext2/3 filesystem, it
> >is configurable.
> >
> >
> Perhaps these policy choices should be mount options, what do you think?

We put these policy options as options in the superblock, but there
are some advantages in being able to override them at mount-time with
mount options.  For example, one such advantage is that we can
standardize them across different filesystems.

However, even if we do have standardized mount options, it is a real
pain to have to type a very long mount option when doing manual
mounts.  So having defaults that can be stored in the superblock seems
to be a good idea, in my opinion.

						- Ted

Index Home About Blog