From: torvalds@transmeta.com (Linus Torvalds) Newsgroups: linux.dev.kernel Subject: Re: 64 bit divide/mod in 2.4.0-test5 Date: 4 Aug 00 23:24:44 GMT In article <d3n1itxely.fsf@lxplus015.cern.ch>, Jes Sorensen <jes@linuxcare.com> wrote: > >Well you are seeing those because it's deprecated, you should use a >shift operator instead if in any way possible. Indeed. Basically, I _know_ that I could include libgcc, but I also do not believe that people should get code that looks simple but takes hundreds of cycles to complete. The fact that 64-bit divisions do not work naturally has actually saved us a number of times from people just not thinking about what an expensive operation such a division is - quite often there is a trivial shift that can do the same thing in two cycles or similar. There's some special-case code in <asm/div64.h>, which is used mainly for printk(), but can be used for other stuff too: a 64-by-64 divide is usually much slower than a 64-by-32 divide which is actually the one that most people are looking for anyway (but gcc isn't clever enough to know this). That can be used if you _really_ think you need the divide. I've yet to see a real reason for it outside the printk() stuff.. Linus

From: torvalds@transmeta.com (Linus Torvalds) Newsgroups: linux.dev.kernel Subject: Re: 64 bit divide/mod in 2.4.0-test5 Date: 6 Aug 00 05:40:04 GMT On Sun, 6 Aug 2000, Jamie Lokier wrote: > > You're aware of the algorithm for printing digits that doesn't use > any divisions, and decided not to use it, right? I'm aware of at least one such algorithm, yes. However, that one is designed for a single base (or a small number of bases) and basically just creates a nice table in memory of the powers-of-base or uses pre-computed tables. That one only uses simple subtraction to generate the numbers, but would have been a much bigger change to existing code. The differences for each base gets moderately ugly too. Is that the one you're talking about? Basically i = MAX_DIGITS; do { unsigned long long thispower = power_table[--i]; char c = '0'; while (thispower < num) { num -= thispower; c++; } output(c); } while (i); or is there something more clever that can handle arbitrary bases cleanly (not that we actually use arbitraty bases: the basic vsprintf routines could handle them, but we limit the bases to the normal 8, 10 and 16 anyway - and as 8 and 16 can be handled with shifting only one really requires any real computation)? Linus

From: lk@tantalophile.demon.co.uk (Jamie Lokier) Newsgroups: linux.dev.kernel Subject: Re: 64 bit divide/mod in 2.4.0-test5 Date: 6 Aug 00 19:57:16 GMT Linus Torvalds wrote: > I'm aware of at least one such algorithm, yes. However, that one is > designed for a single base (or a small number of bases) and basically just > creates a nice table in memory of the powers-of-base or uses pre-computed > tables. > [...] > Is that the one you're talking about? Yes, except instead of precomputing, you can generate power_table each time by doing successive multiplications with the power. You only need to generate as many entries as digits in the current value. The multiplications are trivial if it's a constant power. Of course 10 is the only significant non-power-of-two base anyway. I've never tried that algorithm but I would guess it's faster than long division. -- Jamie

From: torvalds@transmeta.com (Linus Torvalds) Newsgroups: linux.dev.kernel Subject: Re: 64 bit divide/mod in 2.4.0-test5 Date: 6 Aug 00 20:04:46 GMT On Sun, 6 Aug 2000, Jamie Lokier wrote: > > I've never tried that algorithm but I would guess it's faster than long > division. Quite possible. I would not be adverse to doing this: the do_div() thing is there mainly because it was much easier to move the existing code-base over, and the original vsprintf used division. If somebody wants to write the code... Linus

From: torvalds@transmeta.com (Linus Torvalds) Newsgroups: linux.dev.kernel Subject: Re: 64 bit divide/mod in 2.4.0-test5 Date: 5 Aug 00 00:51:17 GMT In article <8mflv4$lit$1@cesium.transmeta.com>, H. Peter Anvin <hpa@zytor.com> wrote: > >There is also big difference on some architectures between 64/32 = 64 >and 64/32 = 32 (the latter is a single, albeit slow, instruction on >x86.) I'm rather surprised gcc doesn't handle this -- it seems as an >obvious optimization. > >Could we get the gcc people to look at this? Probably not. Doing 64/32->32 thing as a single instruction on x86 is an illegal optimization: if your source code looks like unsigned long a, b; unsigned long long c; a = c / b; then according to the C language the c/b division is actually a 64/promoted64 bit division, and then the 64-bit result casted down to 32 bits. What's the difference? Overflow. The cast from 64 bits to 32 bits cannot overflow, so gcc would have to basically always do the 64/32->64 bit division, and then just silently drop the extra bits in order to get the right end result. Now, that 64/32->64 bit division is fairly simple on x86 (certainly simpler than the 64/64->64 one), and that's actually what do_div() will use on x86. But it's two "div" instructions, not one (the first one can often be optimized away if the upper 32 bits are usually 0, which is the case 99%+ of the time with most regular distributions of numbers). Note that the 64/32->64 division is a _lot_ simpler than the 64/64->64 one, so this is definitely still worth doing. But it's not a single instruction. Linus

Index Home About Blog