HZ (Linus Torvalds)

Index Home About Blog

Newsgroups: fa.linux.kernel
From: torvalds@transmeta.com (Linus Torvalds)
Subject: Re: Why HZ on i386 is 100 ?
Original-Message-ID: <a9hjd0$16s$1@penguin.transmeta.com>
Date: Tue, 16 Apr 2002 16:30:13 GMT
Message-ID: <fa.j2es1hv.172c4i2@ifi.uio.no>

In article <20020416100148.GA17560@venus.local.navi.pl>,
Olaf Fraczyk  <olaf@navi.pl> wrote:
>On 2002.04.16 12:29 Liam Girdwood wrote:
>>
>> I remember reading that a higher HZ value will make your machine more
>> responsive, but will also mean that each running process will have a
>> smaller CPU time slice and that the kernel will spend more CPU time
>> scheduling at the expense of processes.
>>
>Has anyone measured this?
>This shouldn't be a big problem, because some architectures use value
>1024, eg. Alpha, ia-64.

On the ia-64, they do indeed use a HZ value of 1000 by default.

And I've had some Intel people grumble about it, because it apparently
means that the timer tick takes anything from 2% to an extreme of 10%
(!!) of the CPU time under certain loads.

Apparently the 10% is due to cache/tlb intensive loads, and as a result
the interrupt handler just missing in the caches a lot, but still:
that's exactly the kind of load that you want to buy an ia64 for.

There's no point in saying that "the timer interrupt takes only 0.5% of
an idle CPU", if it takes a much larger chunk out of a busy one.

So the argument that a kHz timer takes a noticeable amount of CPU power
seems to be still true today - even with the "architecture of tomorrow".

Yeah, I wouldn't have believed it myself, but there it is..  You only
get the gigaHz speeds if you hit in the cache - when you miss, you start
crawling (everything is relative, of course: the crawl of today is a
rather rapid one by 6502 standards ;)

		Linus

Newsgroups: fa.linux.kernel
From: torvalds@transmeta.com (Linus Torvalds)
Subject: Re: HZ, preferably as small as possible
Original-Message-ID: <agtlq6$iht$1@penguin.transmeta.com>
Date: Mon, 15 Jul 2002 05:20:06 GMT
Message-ID: <fa.k2264jv.2ni3gs@ifi.uio.no>

In article <Pine.LNX.3.96.1020711162333.5732C-100000@gatekeeper.tmr.com>,
Bill Davidsen  <davidsen@tmr.com> wrote:
>On Thu, 11 Jul 2002, Martin Dalecki wrote:
>
>> vmstat.c:
>>
>> hz = sysconf(_SC_CLK_TCK);	/* get ticks/s from system */
>>
>> And yes I know the libproc is *evil* in this area.
>> The rest should be an implementation detail of sysconf().
>
>Yes, any of the changes need to make the dynamic value available to
>programs.

No they don't.

Have people looked at the 2.5.x patches?

CLK_TCK is 100 on x86. As it has always been. User land should never
care about whatever random value the kernel happens to use for the
actual timer tick at that particular moment. Especially since the kernel
internal timer tick may well be variable some day.

The fact that libproc believes that HZ can change is _their_ problem.
I've told people over and over that user-level HZ is a constant (and, on
x86, that constant is 100), and that won't change.

So in current 2.5.x times() still counts at 100Hz, and /proc files that
export clock_t still show the same 100Hz rate.

The fact that the kernel internally counts at some different rate should
be _totally_ invisible to user programs (except they get better latency
for stuff like select() and other timeouts).

		Linus

Newsgroups: fa.linux.kernel
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: HZ, preferably as small as possible
Original-Message-ID: <Pine.LNX.4.33.0207151148080.19586-100000@penguin.transmeta.com>
Date: Mon, 15 Jul 2002 18:56:04 GMT
Message-ID: <fa.n96176v.c7ih8f@ifi.uio.no>

On Mon, 15 Jul 2002, Albert D. Cahalan wrote:
>
> It's not a different value in libproc. There's autodetection.
> I can't just support "the majority of ARM", and people keep
> giving me shit about HZ supposedly being a per-arch constant.
> (not that there's a sane way to get a per-arch constant from
> user code anyway)

But that's just _wrong_.

There _is_ a sane way to get the per-arch constant, and there has been for
a long long time.

The kernel exports it with the AT_CLKTCK ELF auxiliary note to every ELF
binary ever loaded, and I think glibc in turn exports that value through
the regular sysconf(_SC_CLK_TCK) thing. (Yeah, I disagree with some of the
glibc sysconf implementation, but it sure should be there, and it's
documented).

If that doesn't work, then it's a glibc bug (well, in theory there could
be a kernel bug too, but since it's a one-liner in the kernel I really
doubt it).

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Fri, 08 Jul 2005 22:15:19 UTC
Message-ID: <fa.hbkb5sv.lnu3of@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507081505230.17536@g5.osdl.org>

On Fri, 8 Jul 2005, Andrew Morton wrote:
> >
> > The previous value here i386 is 1000 --- so why is the default 250.
>
> Because 1000 is too high.

Yes. I chose 1000 originally partly as a way to make sure that people that
assumed HZ was 100 would get a swift kick in the pants. That meant making
a _big_ change, not a small subtle one. For example, people tend to react
if "uptime" suddenly says the machine has been up for a hundred days (even
if it's really only been up for ten), but if it is off by just a factor of
two, it might be overlooked.

So 1kHz was a bit of an overkill, but it worked well enough that we never
really got around to changing it.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Wed, 13 Jul 2005 19:17:35 UTC
Message-ID: <fa.hc3v64o.l7m3g2@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507131203300.17536@g5.osdl.org>

On Wed, 13 Jul 2005, Vojtech Pavlik wrote:
>
> No, but 1/1000Hz = 1000000ns, while 1/864Hz = 1157407.407ns. If you have
> a counter that counts the ticks in nanoseconds (xtime ...), the first
> will be exact, the second will be accumulating an error.

It's not even that we have a counter like that, it's the simple fact that
we have a standard interface to user space that is based on milli-, micro-
and nanoseconds.

(For "poll()", "struct timeval" and "struct timespec" respectively).

It's totally pointless saying that we can do 864 Hz "exactly", when the
fact is that all the timeouts we ever get from user space aren't in that
format. So the only thing that matters is how close to a millisecond we
can get, not how close to some random number.

So we do a lot of conversions from "struct timeval" to "jiffies", and if
you don't take the error in that conversion into account, then you're
ignoring what is likely a _bigger_ error.

Long-term time drift is a known issue, and is unavoidable since you don't
even know the exact frequency of the crystal, since that is not only not
that exact in the first place, it depends on temperature etc. So long-term
time drift is something that we inevitably have to use things like NTP to
handle, if you want an exact clock.

And in short-term things, the timeval/jiffie conversion is likely to be a
_bigger_ issue than the crystal frequency conversion.

So we should aim for a HZ value that makes it easy to convert to and from
the standard user-space interface formats. 100Hz, 250Hz and 1000Hz are all
good values for that reason. 864 is not.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: Linux v2.6.13-rc3
Date: Wed, 13 Jul 2005 22:51:23 UTC
Message-ID: <fa.hd4174q.m7k2g4@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507131045530.17536@g5.osdl.org>

On Wed, 13 Jul 2005, Lee Revell wrote:

> On Tue, 2005-07-12 at 22:05 -0700, Linus Torvalds wrote:
> > I think the shortlog speaks for itself.
>
> HZ still defaults to 250.  As was explained in another thread, this will
> break apps like MIDI sequencers and won't really save much battery
> power.

Stop bothering with this, I've seen the thread, and no, I disagree totally
with "as explained in another thread". That's simply not true.

The only thing that is true is that 100Hz is too low for some use, and
1000Hz is too high for some uses. NOBODY has shown that 250Hz isn't good
enough, there's only been people whining and complaining and saying it
might not be.

The fact is, engineering is about finding something that works "well
enough". If _you_ think that 1000Hz is the right answer, then _you_ select
that. But if you cannot accept the fact that other people are of a
different opinion, then why would anybody want to discuss the issue with
you?

This is a fundamental fact of engineering (and, in fact, pretty much any
other area in life):

	If you cannot accept that other people have other aims and
	needs than than you, then why are you talking to other people in
	the first place?

So get on with your lives. Realize that there is no "perfect" value for
HZ. 250 right now is somewhere reasonable, and for the extreme ends you
can always chose your own. Don't try to force your ideas on others.

And btw, the next time somebody complains about HZ, I want HARD DATA. I
don't want whining. Stop cc'ing me in you don't have a real datapoint, and
if you cannot accept that other people have _other_ real datapoints.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: High irq load (Re: [PATCH] i386: Selectable Frequency of the Timer
Date: Wed, 13 Jul 2005 22:55:53 UTC
Message-ID: <fa.hb3t6kt.k7g309@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507131128100.17536@g5.osdl.org>

On Wed, 13 Jul 2005, Jan Engelhardt wrote:
>
> No, some kernel code causes a triple-fault-and-reboot when the HZ is >=
> 10KHz. Maybe the highest possible value is 8192 Hz, not sure.

Can you post the triple-fault message? It really shouldn't triple-fault,
although it _will_ obviously spend all time just doing timer interrupts,
so it shouldn't get much (if any) real work done either.

A triple fault implies that there's something strange going on, like the
timer interrupt allowing itself to interrupt itself (ie we migt get a
timer interrupt that takes so long that another timer interrupt happens
while the first one is still running).

The irq code should protect against that kind of nested irq's, though,
which is why I'd like to hear more about this. It might be somebody
touching the timer chip at an inopportune time without holding the proper
locks, or something nasty - a real bug that you just don't see normally
and that a high timer frequency just happens to make obvious.

There should be no conceptual "highest possible HZ", although there are
certainly obvious practical limits to it (both on the timer hw itself, and
just the fact that at some point we'll spend all time on the timer
interrupt and won't get anything done..)

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 01:58:56 UTC
Message-ID: <fa.hakb74s.kn22g6@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507131847000.17536@g5.osdl.org>

On Wed, 13 Jul 2005, Lee Revell wrote:
>
> Interesting.  First they say it's impractical to reprogram the PIT, then
> they later imply that's exactly what Windows does, though for some
> reason they don't come out and say it.

I suspect that it is impractical to reprogram the PIT on a very fine
granularity.

Btw, if somebody really gets excited about all this, let me say (once
again) what I think might be an acceptable situation.

First off, I'm _not_ a believer in "sub-HZ ticks". Quite the reverse. I
think we should have HZ be some high value, but we would _slow_down_ the
tick when not needed, and count by 2's, 3's or even 10's when there's not
a lot going on.

In other words, I don't think we want a _highfrequency_ timer, I want a
_lower_ frequency mode.

So let's say that we raise HZ to 2000, or something that we decide is the
upper limit of sanity. We then have some timer logic entity that notices
that nothing is going to care for the next 100 ticks, so we go into "slow
mode", and reprogram the timer to tick at a frequency of 100Hz, but when
it does tick, we just count it as 20.

IOW, nothing ever sees any "variable frequency", and there's never any
question about what the timer tick is: the timer tick is 2kHz as far as
everybody is concerned. It's just that the ticks sometimes come in
"bunches of 20".

This also means that there is never any issue of the timer running wild.
The _most_ it will ever run at is limited quite naturally, and some crazy
user asking for a 1ns itimer won't make any difference at all to the
system.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 16:41:36 UTC
Message-ID: <fa.h94r5su.m7m28e@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507140933150.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Vojtech Pavlik wrote:
>
> A note on the relaive timer API: There needs to be a way to say
> "x milliseconds from the time this timer should have triggered" instead
> of "x milliseconds from now", to avoid skew in timers that try to be
> strictly periodic.

I disagree.

There should be an _absolute_ interface, and a driver that wants that
should just have calculated when in time the timeout finishes - and then
keep on using the absolute value.

Btw, this is exactly why the jiffy-based thing is _good_. The kernel
timers _are_ absolute, and you make them relative by adding "jiffies".

The fact is, the current timers are better than people give them credit
for, and converting them away from a jiffies-based interface (to a
usleep-like one) is STUPID.

There's absolutely nothing wrong with "jiffies", and anybody who thinks
that

	msleep(20);

is fundamentally better than

	timeout = jiffies + HZ/50;

just doesn't realize that the latter is a bit more complicated exactly
because the latter is a hell of a lot more POWERFUL. Trying to get rid of
jiffies for some religious reason is _stupid_.

I have to say, this whole thread has been pretty damn worthless in general
in my not-so-humble opinion.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 17:30:10 UTC
Message-ID: <fa.h94d5kt.m7020f@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141022070.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Chris Friesen wrote:
>
> But if all I really want is to sleep for 20ms, what does the additional
> power actually buy me?

If you _only_ want to sleep for 20ms, it doesn't buy you anything.

But the sleep is often part of a bigger picture, where the 20ms might be
part of a bigger loop that wants to run for at most a second, after which
it will error out.

At which point it's not just about sleeping 20ms any more.

I'm not saying that we should get rid of msleep(). I'm saying that anybody
who thinks the jiffies-based stuff should always be rewritten as msleep()
simply doesn't know what the hell he is talking about. At ALL.

Jiffies are here to stay, and they are here to stay for some very very
fundamental reasons. If you hear somebody arguing for removing jiffies,
you should piss in their general direction, and realize that they don't
know what they are talking about.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 20:15:58 UTC
Message-ID: <fa.ha4754t.n7a2gf@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141302210.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Russell King wrote:
>
> Umm.  Except, according to your description of what it's supposed to
> do, the above code can have an accumulating error.

No. It can have a local drift, but the point is, the error never gets
worse - it _stays_ local.

There's no point in polling twice in immediate succession just because a
sleep overslept. That's like a security guard testing each door twice for
being locked, just because he overslept one round. Pointless.

But what matters is that you don't let your local errors accumulate into
the big picture.

Now, if somebody wants to make nicer helper functions so that you can say

	timeout = ms_from_now(500);

or something instead of saying "timeout = jiffies + HZ/2", then hey, go
wild. At that point it's just syntactic sugar, and maybe it's worth it.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 20:18:42 UTC
Message-ID: <fa.h94h552.m742ga@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141307060.19183@g5.osdl.org>

On Thu, 14 Jul 2005, john stultz wrote:
>
> We'll I'd probably put it as: "they do care about absolute time, but
> they do not care about ticks or timer interrupt frequency"

Well, the thing is, you have to count time some sane way.

You can do it by having very expensive data structures that say "real
time", but then you have some serious confusion when it comes to things
like whether it's wallclock time (which might have shifts and other
interesting issues) or some "virtual cpu time". You also end up having a
much much more expensive interface, ie "time_after()" ends up being a much
more complicated test.

So the _sane_ way to do timeouts is to define an _arbitrary_ clock that is
just an integer counter. None of this "nanoseconds + full seconds" crap.
None of this stupid confusion with "real time". You select something that
is conceptually _clearly_ something else, and that will never get confused
when root sets the time backwards or anything like that.

In other words, you select the thing we call "jiffies".

Face it, it is just _superior_ to the alternatives. The alternative is to
have some "fake time" aka "struct timespec", and always have to worry
about normalization and complicated comparisons, and using more memory
too, btw.

There is no way to avoid having some kind of counter to specify time.
NONE. The only choice is what you base your notion of time on, and how you
represent it. Do you represent it as two separate counters and try to make
it look like "fractions of seconds", or do you represent it as a single
counter, and make it look like "ticks".

And the single counter is _simpler_. The alternatives have absolutely
_zero_ upsides. Name _one_.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 23:20:19 UTC
Message-ID: <fa.hbkj54r.kna2gd@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141600540.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Alan Cox wrote:
>
> > just doesn't realize that the latter is a bit more complicated exactly
> > because the latter is a hell of a lot more POWERFUL. Trying to get rid of
> > jiffies for some religious reason is _stupid_.
>
> Getting rid of jiffies in its current form is a huge win for very
> non-religious reasons. Jiffies is expensive in power management and
> virtualisation because you have to maintain it.

No, you're now confusing "interrupts" with "jiffies".

There is no conceptual 1:1 thing between those two.

It so happens that traditionally we've kept them 1:1, but there's nothing
that says that we can't slow down the interrupt source and just increment
jiffies by a factor of the slowdown when the interrupt _does_ happen.

But no, that does NOT mean that "jiffies" should just count nanoseconds,
and the problem would be solved. The fact is, most users of jiffies only
care about the low 32 bits on 32-bit architectures, and that's fine as
long as jiffies are in the millisecond range, since it still leaves a
useful timeout value for almost everything (and then only long-range stuff
needs to use "u64" for their timeouts).

In other words, we want a clock that is _known_ to not be very accurate,
but that is easy to just read from a memory location, and that has some
relationship to a timer tick in the sense that it should be at least in
the order-of-magnitude range for what a timer tick can cause.

Anybody who asks for nanoseconds is confused. That just forces you to use
a 64-bit value, where no such value is needed. Things like TCP
retransmission timeouts would be totally _idiotic_ to be made in
nanoseconds: it would just make the socket data structures larger, and it
has zero relevance, since the actual timer tick doesn't have that kind of
resolution _anyway_.

The current "jiffies" actually fits all of these problems _wonderfully_
well. Yes, it needs to be converted from "struct timeval" and friends, but
it needs to be converted exactly _because_ of the good properties it has,
namely that it fits in a word, and is _relevant_ to what a timer interrupt
ends up being.

Look at 99% of the use of jiffies: it uses _jiffies_. It doesn't use
"jiffies_64", even though that's actually what gets updated. And it does
that _exactly_ because almost _nobody_ cares to pay the price of 64 bit
issues (both structure memory usage, and atomicity costs on 32-bit
architectures).

And I claim that you _cannot_ do better.

But what you can do is to have HZ at some reasonably high value (ie in the
kHz range), and then slow down the system clock to conserve energy, and
increment jiffies by 16 or 32 when in "slow clock mode". And then, when
there is a multimedia app or something that asks for high-precision
timers, you speed the interrupts up again, and you increment jiffies by 1.

It's that simple. And it really _is_ simple.

And guys, the fact is, jiffies works _today_. Making this change won't
break anything, and won't introduce any new concepts, and won't break any
existing drivers. In contrast, introducing _yet_ another timekeeping
mechanism is not only going to be objectively _worse_ than jiffies from a
technical standpoint, it's going to be a total disaster from a transition
standpoint too.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 23:26:50 UTC
Message-ID: <fa.h9kp5cv.mnc2o9@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141614170.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Alan Cox wrote:
>
> I suspect the problem for some of this is that people think of jiffies
> as incrementing by 1. If HZ is right then jiffies can be in nS, it just
> won't increment by 1.

No, jiffies _cannot_ be in nS, because of the fact that then it doesn't
fit in a word any more. A lot of things want timeouts in the tens of
minutes, and a jiffy clock that tries to ne in nS just screws that up
entirely, and forces people to use u64.

Which is much more expensive to compare on 32-bit architectures due to
nasty atomicity issues.

So you want to keep the "normal" timeout 32-bit. In ten years we may not
care any more. For the forseeable future we definitely do.

> Its also why jiffies() is better on some platforms
> because many machines can answer "what time is it" far more accurately
> than they can set interrupts.

That's not what "jiffies" are about. If you want accurate time, use
something else, like gettimeofday. The timeouts are _only_ relevant on the
scale of a timer interrupt, since by definition that's what we're waiting
for.

So accuracy is a total non-issue. The only kind of accuracy we care about
is "how often can the timer ticks happen".

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 23:29:16 UTC
Message-ID: <fa.hb4t5ku.k7g20e@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141623490.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Lee Revell wrote:
>
> On Thu, 2005-07-14 at 09:37 -0700, Linus Torvalds wrote:
> > I have to say, this whole thread has been pretty damn worthless in
> > general in my not-so-humble opinion.
> >
>
> This thread has really gone OT, but to revisit the original issue for a
> bit, are you still unwilling to consider leaving the default HZ at 1000
> for 2.6.13?

Yes. I see absolutely no point to it until I actually hear people who have
actually tried some real load that doesn't work. Dammit, I want a real
user who says that he can noticeable see his DVD stuttering, not some
theory.

I'm incredibly fed up with the theoretical whining.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 23:36:50 UTC
Message-ID: <fa.hb4b5l2.k7220a@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141627400.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Linus Torvalds wrote:
>
> But what you can do is to have HZ at some reasonably high value (ie in the
> kHz range), and then slow down the system clock to conserve energy, and
> increment jiffies by 16 or 32 when in "slow clock mode".

Btw, it doesn't have to even be a slow-down due to the kernels decision.

In a VM environment, the timer interrupt might be erratic, and the timer
interrupt might read some hardware register (TSC or something) and use
_that_ to increment jiffies by the "proper" amount.

See? The point is that "jiffies" is useful exactly because it's very cheap
to read portably (there are no portable high-performance alternatives) and
because it has the right resolution to be useful in 32 bits.

That doesn't mean that the code that updates it can't be clever. We
already have code that updates it that is a lot more intelligent than 99%
of the code that reads it:  we update it in 64 bits under xtime_lock, even
though most readers use a lock-less 32-bit read and only get a partial
value - the part they care about.

And this is a wonderful property that everybody seems to be ignoring, even
though we have absolutely tons of code that just takes all of this
goodness for granted.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Thu, 14 Jul 2005 23:56:34 UTC
Message-ID: <fa.h94p653.m7c3g5@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141648070.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Lee Revell wrote:
>
> And I'm incredibly frustrated by this insistence on hard data when it's
> completely obvious to anyone who knows the first thing about MIDI that
> HZ=250 will fail in situations where HZ=1000 succeeds.

Ok, guys. How many people have this MIDI thing? How many of you can't be
bothered to set the default to suit your usage?

> It's straight from the MIDI spec.  Your argument is pretty close to "the
> MIDI spec is wrong, no one can hear the difference between 1ms and 4ms".

No.

YOUR argument is "nobody else matters, only I do".

MY argument is that this is a case of give and take.

		Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Fri, 15 Jul 2005 00:26:59 UTC
Message-ID: <fa.hakn5d3.nna2o5@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141718350.19183@g5.osdl.org>

On Thu, 14 Jul 2005, Lee Revell wrote:
>
> I don't think this will fly because we take a big performance hit by
> calculating HZ at runtime.

I think it might be an acceptable solution for a distribution that really
needed it, since it should be fairly simple. However, it's definitely not
the right solution.

HOWEVER. I bet that somebody who really really cares (hint hint) could
easily make HZ be 1000, and then dynamically tweak the divisor at bootup
to be either 1000, 250, or 100, and then increment "jiffies" by 1, 4 or
10.

My wild guess is that this is 20 lines of code, plus another 20 for
"setup", so that you can choose between 100/250/1000 Hz with a kernel
command line.

It wouldn't be "dynamic" at first - you'd just set it up at bootup, and
set a "jiffies_increment" variable, and change the

	jiffies_64++;

into

	jiffies_64 += jiffies_increment;

and you'd be done.

Really. I dare you guys. First one to send me a tested patch gets a gold
star.

Then, a year from now, people will realize how _easy_ it is to change the
jiffies_increment on the fly, and add a /sys/kernel/timer_frequency file,
and then you can switch it around at run-time.

Trust me. When I say that the right thing to do is to just have a fixed
(but high) HZ value, and just changing the timer rate, I'm -right-.

I'm always right. This time I'm just even more right than usual.

			Linus

From: Linus Torvalds <torvalds@osdl.org>
Newsgroups: fa.linux.kernel
Subject: Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
Date: Fri, 15 Jul 2005 00:44:38 UTC
Message-ID: <fa.hakv5t0.nni288@ifi.uio.no>
Original-Message-ID: <Pine.LNX.4.58.0507141735390.19183@g5.osdl.org>

On Fri, 15 Jul 2005, Jesper Juhl wrote:
>
> Even if we only have to do it once at boot?  The thought was to detect
> what type of machine we are booting on, figure out what a good HZ
> would be for that type of box, then set that HZ value and treat it as
> a constant from that point forward.

No, it really should be a compile-time constant, or a lot of things get a
lot more expensive. There's a HZ embedded in a lot of places, and some of
them are divides, for example. Others do optimized special cases based on
static knowledge of what HZ is.

So this is why I so strongly argue that we should have a constant HZ, but
a dynamic _increment_ of "jiffies". Nobody (obviously) depends on jiffies
being constant, so it's ok to increment jiffies by pretty much any value.

Yeah, yeah, there might be some _very_ few code-paths (bogomips, I think)
that may look at when "jiffies" changes, and actually measure one tick
that way. They would need to be taught that they don't measure "one" tick
any more, they measure "jiffies_increment" ticks or something.

But I really wouldn't be surprised if the bogomips calibration loop was
really the only thing that needed some small tweaking for increments of
other than one.

(Oh, we'll find other things we want to fix up, and such a change would
result in other changes down the line, no question about that.  But I
don't think it would be very much at all, and I don't think it would
turn out at all traumatic).

		Linus

Index Home About Blog