Floating point numbers are positive or negative numbers with a decimal fraction: numbers such as 1.0 or 17.11 or –3.12 are all floating point numbers.

On the Arduino and other microprocessors are stored in ‘floats’.

Whereas the concept of a byte or an integer is quite straightforward, a float in binary form is a bit more challenging.

A byte is simple, it is just 2⁷+2⁶+2⁵+2⁴+2³ + 2²+2¹ +2⁰.

An integer is similar, be it that it extends to 2^{15}.

But how do you store a number like –1.5 ?

Well, floats are stored on the concept that practically every number can be expressed as the multiplication of a power of 2 times a number between 1 and 2. Take for instance 7, or 7.00 for that matter, that can be expressed as 4 * 1.75 (or 2² * 1.75).

the number -1.5 is in fact 1 * 1.5 which is 2⁰ * 1.5, preceded by a ‘-‘ sign.

The same goes for higher numbers, say 20.5= 16 * 1.28125 (=2⁴ * 1.28125).

Sure 20.5 can also be expressed by 2*10.25, but the last number must be between 1 and 2.

So in fact every number can be represented by:

sign * 2^{x} * y (y being a number between 1 and 2).

According to agreement we call ‘x’ the ‘exponent’ and ‘y’ the ‘mantissa’ though the word mantissa is also used for the fractional part of a logarithm. The IEEE standard for floating point numbers therefore encourages to use the word ‘fraction’ instead of ‘mantissa’, so we can write the above as:

floating point number= sign * 2^{exponent} * fraction

Lets get back to the fraction part of the number 20.5 which is 1.28125 (remember? 20.5=2⁴ * 1.28125). If we look at that a bit deeper, we can see that that is actually 1+ 1/4 + 1/32. That makes sense coz 16* (1+1/4+1/32)= 16+4+0.5=20.5.

If we would break this down again we can see that the ‘fraction’ or mantissa is actually a sum of fractions that all are again 1/(a power of 2). It is probably clear by now that for 20.5 that would be 1/2⁰ + 1/2² +1/2⁵

Anyway, back to the binary storage.

As said, on the Arduino and many other processors, the floating number is stored in 32 bits and the protocol to store that follows from the notation we have learned above.

The most left bit, bit 32, stores the ‘sign’ if it is a ‘1’ the number is negative, if it is a ‘0’ it is positive.

The next 8 bits, bits 31-24 store the exponent. as we want values between 2^{128} and 2^{-127}, we store 2¹ as 10000000 (decimal 128), 2² as 10000001 (decimal 129), 2³ as 10000010 (decimal 130) etc… The exponent thus follows from subtracting 127 from the decimal number that is stored in bits 31-24.

The fraction or mantissa is stored in bits 23-1. However, since we know that the fraction is always between 1 an 2, we do not store the ‘1’ as we know it is always there. We refer to that as the ‘hidden’ bit, although it is not hidden, it is just not stored. We use bits 23-1 to indicate a sum of the fractions 1/2, 1/4, 1/8, 1/16 etc.

So, the binary storage of a float is as follows:

sign | exponent | hidden | fraction | |

20.5 | 0 | 10000011 | 01001000000000000000000 | |

+ | 4(131-127) | 1+ | 1/4 +1/32 | |

-7 | 1 | 10000001 | 11000000000000000000000 | |

– | 2(129-127) | 1+ | 1/2+1/4 |

Actually a byte includes 8 bits, so the MSB is 2^7. An Arduino 16 bit integer has a MSB of 2^15.

Hugh of course you are right, I must have had a black out. Too much in a hurry to get to the 32 bits float Tnx. It is corrected