I/O space accesses (Linus Torvalds)

Index Home About Blog

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Being more anal about iospace accesses..
Original-Message-ID: <Pine.LNX.4.58.0409150859100.2333@ppc970.osdl.org>
Date: Wed, 15 Sep 2004 16:37:52 GMT
Message-ID: <fa.gse7v6d.a2irhp@ifi.uio.no>

This is a background mail mainly for driver writers and/or architecture
people. Or others that are just interested in really low-level hw access
details. Others - please feel free to ignore.

[ This has been discussed to some degree already on the architecture
  mailing lists and obviously among the people who actually worked on it,
  but I thought I'd bounce it off linux-kernel too, in order to make
  people more aware of what the new type-checking does. Most people may
  have seen it as only generating a ton of new warnings for some crufty
  device drivers. ]

The background for this iospace type-checking change is that we've long
had some serious confusion about how to access PCI memory mapped IO
(MMIO), mainly because on a PC (and some non-PC's too) that IO really does
look like regular memory, so you can have a driver that just accesses a
pointer directly, and it will actually work on most machines.

At the same time, we've had the proper "accessor" functions (read[bwl](),
write[bwl]() and friends) that on purpose dropped all type information
from the MMIO pointer, mostly just because of historical reasons, and as a
result some drivers didn't use a pointer at all, but some kind of integer.
Sometimes even one that couldn't _fit_ a MMIO address in it on a 64-bit
machine.

In short, the PCI MMIO access case was largely the same as the user
pointer case, except the access functions were different (readb vs
get_user) and they were even less lax about checking for sanity. At least
the user access code required a pointer with the right size.

We've been very successful in annotating user pointers, and that found a
couple of bugs, and more importantly it made the kernel code much more
"aware" of what kind of pointer was passed around. In general, a big
success, I think. And an obvious example for what MMIO pointers should do.

So lately, the kernel infrastructure for MMIO accesses has become a _lot_
more strict about what it accepts. Not only do the MMIO access functions
want a real pointer (which is already more type-checking than we did
before, and causes gcc to spew out lots of warnings for some drivers), but
as with user pointers, sparse annotations mark them as being in a
different address space, and building the kernel with checking on will
warn about mixing up address spaces. So far so good.

So right now the current snapshots (and 2.6.9-rc2) have this enabled, and
some drivers will be _very_ noisy when compiled. Most of the regular ones
are fine, so maybe people haven't even noticed it that much, but some of
them were using things like "u32" to store MMIO pointers, and are
generally extremely broken on anything but an x86.  We'll hopefully get
around to fixing them up eventually, but in the meantime this should at
least explain the background for some of the new noise people may see.

Perhaps even more interesting is _another_ case of driver, though: one
that started warning not because it was ugly and broken, but because it
did something fairly rare but something that does happen occasionally: it
mixed PIO and MMIO accesses on purpose, because it drove hardware that
literally uses one or the other.

Sometimes such a "mixed interface" driver does it based on a compile
option that just #defines 'writel()' to 'inl()', sometimes it's a runtime
decision depending on the hardware or configuration.

The anal typechecking obviously ended up being very unhappy about this,
since it wants "void __iomem *" for MMIO pointers, and a normal "unsigned
long" for PIO accesses. The compile-time option could have been easily
fixed up by adding the proper cast when re-defining the IO accessor, but
that doesn't work for the dynamic case.

Also, the compile-time switchers often really _wanted_ to be dynamic, but
it was just too painful with the regular Linux IO interfaces to duplicate
the code and do things conditionally one way or the other.

To make a long story even longer: rather than scrapping the typechecking,
or requiring drivers to do strange and nasty casts all over the place,
there's now a new interface in town. It's called "iomap", because it
extends the old "ioremap()" interface to work on the PIO accesses too.

That way, the drivers that really want to mix both PIO and MMIO accesses
can very naturally do it: they just need to remap the PIO space too, the
same way that we've required people to remap the MMIO space for a long
long time.

For example, if you don't know (or, more importantly - don't care) what
kind of IO interface you use, you can now do something like

	void __iomem * map = pci_iomap(dev, bar, maxbytes);
	...
	status = ioread32(map + DRIVER_STATUS_OFFSET);

and it will do the proper IO mapping for the named PCI BAR for that
device. Regardless of whether the BAR was an IO or MEM mapping. Very
convenient for cases where the hardware migt expose its IO window in
either (or sometimes both).

Nothing in the current tree actually uses this new interface, although
Jeff has patches for SATA for testing (and they clean up the code quite
noticeably, never mind getting rid of the warnings).  The interface has
been implemented by yours truly for x86 and ppc64, and David did a
first-pass version for sparc64 too (missing the "xxxx_rep()" functions
that were added a bit later, I believe).

So far experience seems to show that it's a very natural interface for
most non-x86 hardware - they all tend to map in both PIO and MMIO into one
address space _anyway_, so the two aren't really any different. It's
mainly just x86 and it's ilk that actually have two different interfaces
for the two kinds of PCI accesses, and at least in that case it's trivial
to encode the difference in the virtual ioremap pointer.

The best way to explain the interface is to just point you guys at the
<asm-generic/iomap.h> file, which isn't very big, has about as much
comments than code, and contains nothing but the necessary function
declarations. The actual meaning of the functions should be pretty
obvious even without the comments.

Feel free to flame or discuss rationally,

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: Being more anal about iospace accesses..
Original-Message-ID: <Pine.LNX.4.58.0409151045530.2333@ppc970.osdl.org>
Date: Wed, 15 Sep 2004 18:05:25 GMT
Message-ID: <fa.guttuu9.8i4rpl@ifi.uio.no>

On Wed, 15 Sep 2004, Jörn Engel wrote:
>
> But it still leaves me confused.  Before I had this code:
>
> 	struct regs {
> 		uint32_t status;
> 		...
> 	}
>
> 	...
>
> 	struct regs *regs = ioremap(...);
> 	uint32_t status = regs->status;
> 	...
>
> So now I should do it like this:
>
> 	#define REG_STATUS 0
>
> 	...
>
> 	void __iomem *regs = ioremap(...);
> 	uint32_t status = readl(regs + REG_STATUS);

No, you can certainly continue to use non-void pointers. The "void __iomem
*" case is just the typeless one, exactly equivalent to regular void
pointer usage.

So let me clarify my original post with two points:

 - if your device only supports MMIO, you might as well just use the old
   interfaces. The new interface will _also_ work, but there is no real
   advantage, unless you count the "pci_iomap()" as a simpler interface..

   The new interface is really only meaningful for things that want to
   support _both_ PIO and MMIO. It's also, perhaps, a bit syntactically
   easier to work with, so some people might prefer that for that
   reason. See my comments further down on the auto-sizing. BUT it doesn't
   make the old interfaces go away by any means, and I'm not even
   suggesting that people should re-write drivers just for the hell of it.

   In short: if you don't go "ooh, that will simplify XXX", then you
   should just ignore the new interfaces.

 - you can _absolutely_ use other pointers than "void *". You should
   annotate them with "__iomem" if you want to be sparse-clean (and it
   often helps visually to clarify the issue), but gcc won't care, the
   "__iomem" annotation is purely a extended check.

So you can absolutely still continue with

	struct mydev_iolayout {
		__u32 status;
		__u32 irqmask;
		...

	struct mydev_iolayout __iomem *regs = pci_map(...);
	status = ioread32(&regs.status);

which is often a lot more readable, and thus in fact _preferred_. It also
adds another level of type-checking, so I applaud drivers that do this.

Now, I'm _contemplating_ also allowing the "get_user()" kind of "unsized"
access function for the new interface. Right now all the old (and the new)
access functions are all explicitly sized. But for the "struct layout"
case, it's actually often nice to just say

	status = ioread(&regs.status);

and the compiler can certainly figure out the size of the register on its
own. This was impossible with the old interface, because the old
interfaces didn't even take a _pointer_, much less one that could be sized
up automatically.

(The auto-sizing is something that "get_user()" and "put_user()" have
always done, and it makes them very easy to use. It involved a few pretty
ugly macros, but hey, that's all hidden away, and is actually pretty
simple in the end).

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: Being more anal about iospace accesses..
Original-Message-ID: <Pine.LNX.4.58.0409151059010.2333@ppc970.osdl.org>
Date: Wed, 15 Sep 2004 18:14:20 GMT
Message-ID: <fa.gsdpv6d.a20rhp@ifi.uio.no>

On Wed, 15 Sep 2004, Linus Torvalds wrote:
>
>    In short: if you don't go "ooh, that will simplify XXX", then you
>    should just ignore the new interfaces.

Btw, to get an example of what _is_ simplified, look at
drivers/scsi/libata-core.c:

	void ata_tf_load(struct ata_port *ap, struct ata_taskfile *tf)
	{
	        if (ap->flags & ATA_FLAG_MMIO)
	                ata_tf_load_mmio(ap, tf);
	        else
	                ata_tf_load_pio(ap, tf);
	}

and realize that "ata_tf_load_mmio()" and "ata_tf_load_pio()" are exactly
the SAME FUNCTION. Except one uses MMIO, the other uses PIO. With the new
setup, it literally collapses into one function, and code size goes down
pretty dramatically. Not to mention making the code more readable.

For another example of this (of the static kind), look at something like
drivers/net/8139too.c:

	#ifdef USE_IO_OPS

	#define RTL_R8(reg)             inb (((unsigned long)ioaddr) + (reg))
	#define RTL_R16(reg)            inw (((unsigned long)ioaddr) + (reg))
	#define RTL_R32(reg)            ((unsigned long) inl (((unsigned long)ioaddr) + (reg)))
	...

	#else

	...
	/* read MMIO register */
	#define RTL_R8(reg)             readb (ioaddr + (reg))
	#define RTL_R16(reg)            readw (ioaddr + (reg))
	#define RTL_R32(reg)            ((unsigned long) readl (ioaddr + (reg)))

see? In this case, USE_IO_OPS depends on a static configuration variable,
namely CONFIG_8139TOO_PIO. So the user at _compile_ time has to decide
whether he wants to use MMIO or PIO. See the Kconfig help file:

          This instructs the driver to use programmed I/O ports (PIO) instead
          of PCI shared memory (MMIO).  This can possibly solve some problems
          in case your mainboard has memory consistency issues.  If unsure,
          say N.

In other words, the new interface is not designed to replace the old ones,
it's designed to help drivers like these, that either go to a lot of extra
pain in order to support both methods, or then have a _static_ config
option that makes it really hard for system vendors to just release one
driver that knows when it needs to use PIO and when it needs MMIO.

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: Being more anal about iospace accesses..
Original-Message-ID: <Pine.LNX.4.58.0409151251060.2333@ppc970.osdl.org>
Date: Wed, 15 Sep 2004 19:57:08 GMT
Message-ID: <fa.gse7v65.a2arhh@ifi.uio.no>

On Wed, 15 Sep 2004, Greg KH wrote:
>
> Currently a few drivers do:
> 	status = readl(&regs.status);

I assume that's "&regs->status", since regs had better be a pointer..

> which causes sparse warnings.
>
> How should that code be changed to prevent this?  Convert them all to
> ioread32()?  Or figure out a way to supress the warning for readl()?

Just make sure that you annotate "regs" as a pointer to IO space.

Ie if you have

	struct xxxx __iomem *regs;

then everything will be fine - sparse knows that "regs" is a IO pointer,
and that obviously makes "&regs->member" _also_ an IO pointer.

You'll need that __iomem annotation anyway, since otherwise sparse would
complain when you do the assignment from the "ioremap()" call (and the
thing had better come from ioremap() some way, or it's not valid in the
first place to do "readl()" on it).

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@osdl.org>
Subject: Re: Being more anal about iospace accesses..
Original-Message-ID: <Pine.LNX.4.58.0409151024510.2333@ppc970.osdl.org>
Date: Wed, 15 Sep 2004 17:46:59 GMT
Message-ID: <fa.gutpue8.8i0q9k@ifi.uio.no>

On Wed, 15 Sep 2004, Roland Dreier wrote:
>
> However, I somewhat agree -- it's ugly for drivers rely on this and do
> arithmetic on void *.  It should be OK for a driver to use
> char __iomem * for its IO base if it needs to add in offsets, right?

"char __iomem *" will certainly work - all the normal pointer conversions
are ok. Some people in fact use pointers to structures in MMIO space, and
this is quite reasonable when working with a chip that uses "mailboxes"
for commands.

However, I disagree with "void *" arithmetic being ugly. It's a very nice
feature to have a pointer that can be validly cast to any other type, and
that is the whole _point_ of "void *". The fact that C++ got that wrong is
arguably the worst failing of the language, causing tons of unnecessary
casts that can silently hide real bugs (maybe the thing you cast wasn't a
"void *" in the first place, but you'll never know - the compiler will do
the cast for you).

For example, to go back to the mailbox example, let's say that your
hardware has an IO area that is 8kB in size, with the last 4kB being
mailboxes.

The _sane_ way to do that is to do

	void __iomem *base_io = ioremap(...);
	struct mailbox __iomem *mbox = base_io + MAILBOX_OFFSET;

and then just work on that.

In contrast, having to cast to a "char *" in order to do arithmetic, and
then casting back to the resultant structure type pointer is not only
ugly and unreadable, it's a lot more prone to errors as a result.

In other words, think of "void *" as a pointer to storage. Not "char"
(which is the C name for a signed byte), but really, it's the pointer to
whatever underlying memory there is. And a _fundamental_ part of such
memory is the fact that it is addressable. Thus "pointer to storage
arithmetic" really does make sense on a very fundamental level. It has
nothing to do with C types, and that also explains why "void *" silently
converts to anything else. It's a very internally consistent world-view.

Now, I disagree with gcc when it comes to actually taking the "size" of
void. Gcc will silently accept

	void *x;
	x = malloc(sizeof(*x));

which I consider to be an abomination (and the above _can_ happen, quite
easily, as part of macros for doing allocation etc - nobody would write
it in that form, but if you have an "MEMALLOC(x)" macro that does the
sizeof, you could end up trying to feed the compiler bogus code).

The fact that you can do arithmetic on typeless storage does _not_ imply
that typeless storage would have a "size" in my book.

So sparse will say:

	warning: cannot size expression

and refuse to look at broken code like the above. But hey, the fact that I
have better taste than anybody else in the universe is just something I
have to live with. It's not easy being me.

		Linus

Index Home About Blog