Index Home About Blog
From: (Chris Torek)
Newsgroups: comp.lang.c
Subject: Re: floor() implementation
Date: 2 Jan 1999 14:59:10 -0800

>"Amund Frislie" <> writes:
>   Does anyone know where I can find [an] implementation [of] floor() ...

In article <> Ben Pfaff
<> replies:
>floor() can't be implemented in C AFAIK without use of other library

Indeed.  Typical implementation, usually written in assembly language,
rely on the floating-point format inside the machine.

Suppose "d" is a double-precision floating point number represented
internally as <sign> <fraction> <exponent>, where <sign> is 0 for
positive, 1 for negative; <fraction> is the bits (binary digits) in
1.bbb...b (where each b is 0 or 1), and <exponent> is a (probably
biased) exponent for 2 (or in at least one case, 16).

Such a value is guaranteed to be a whole integer already if the
exponent exceeds the number of bits.  That is:

	+/-  (1.)111...1  (x 2 sup) 52

can never be "something and a half" if there are only 52 digits
after the binary point, just as:


is the same as "123456.0" and has no fractional part (there are
only 5 digits after the decimal point, and the exponent is 5).

So, if your computer's "double" binary format uses 52 bits of
fraction, and the exponent is 52 or greater, floor(x) is just x (x
is itself an integer).  That leaves only those cases where the
exponent is less than 52.  In this case, we can still use the
properties of CPU's arithmetic to separate the pieces.

Suppose you have a value in the range [+0..2^52), and you add to
it a value that is exactly 2^52 (4503599627370496.0, but done in
the computer's internal double format, so that it is 1.0000 x 2^52).
If the result is forced into one of those same "double" formats,
so that it can only hold 52 fractional bits, all of the "irrelevant"
bits will fall out of the sum.  We can then subtract 2^52 again
and get only the relevant bits.  There may be *no* relevant bits,
in which case the result of subtraction should be zero.  This cannot
be written in ordinary C because the compiler might elect to use
a more precise internal representation, in which case irrelevant
fractional bits might remain.

Now, if you have IEEE arithmetic, you also have a "rounding
direction".  The rounding direction must not be set to "towards
negative infinity" because then 2^52 - 2^52 is not +0 but rather
-0.  So again we need to use the properties of the CPU to force
a proper rounding direction (probably trivial in assembly-language,
but again impossible in C).

Note that if the input number is negative, we have a whole different
set of conditions.  We also must worry about inputs being Infinity
and NaN, in IEEE code.

>If you allow use of other library functions then you can implement it
>in terms of modf() I think.

It can, but again you probably need to worry about Inf and NaN on
IEEE systems.  The best bet is usually to write all three in some
mix of machine-specific assembly and/or C.  You may find (and indeed
will find, on the i386) that there is an instruction that does most
of the work. :-)
In-Real-Life: Chris Torek, Berkeley Software Design Inc
El Cerrito, CA	Domain:	+1 510 234 3167  (not always up)	I report spam to abuse@.

Index Home About Blog