# The binary storage of floating point numbers

Floating point numbers are positive or negative numbers with a decimal fraction: numbers such as 1.0 or 17.11 or –3.12 are all floating point numbers.
On the Arduino and other microprocessors are stored in ‘floats’.
Whereas the concept of a byte or an integer is quite straightforward, a float in binary form is a bit more challenging.
A byte is simple, it is just 2⁷+2⁶+2⁵+2⁴+2³ + 2²+2¹ +2⁰.

An integer is similar, be it that it extends to 215.

But how do you store a number like –1.5 ?

Well, floats are stored on the concept that practically every number can be expressed as the multiplication of a power of 2 times  a number between 1 and 2. Take for instance 7, or 7.00 for that matter, that can be expressed as 4 * 1.75 (or 2² * 1.75).

the number  -1.5 is in fact 1 * 1.5 which is 2⁰ * 1.5, preceded by a ‘-‘ sign.
The same goes for higher numbers, say 20.5= 16 * 1.28125  (=2⁴ * 1.28125).
Sure 20.5 can also be expressed by 2*10.25, but the last number must be between 1 and 2.

So in fact every number can be represented by:
sign * 2x  * y  (y being a number between 1 and 2).

According to agreement we call ‘x’ the ‘exponent’ and ‘y’  the ‘mantissa’ though the word mantissa is also used for the fractional part of a logarithm. The IEEE standard for floating point numbers therefore encourages to use the word ‘fraction’  instead of ‘mantissa’, so we can write the above as:
floating point number= sign * 2exponent * fraction

Lets get back to the  fraction part of the number 20.5 which is 1.28125 (remember? 20.5=2⁴ * 1.28125). If we look at that a bit deeper, we can see that that is actually 1+ 1/4 + 1/32. That makes sense coz 16* (1+1/4+1/32)= 16+4+0.5=20.5.
If we would break this down again we can see that  the ‘fraction’ or mantissa is actually a sum of  fractions that all are  again 1/(a power of 2). It is probably clear by now that for 20.5 that would be 1/2⁰ + 1/2² +1/2⁵

Anyway, back to the binary storage.
As said, on the Arduino and many other processors, the floating number is stored in 32 bits and the protocol to store that follows from the notation we have learned above.
The most left bit, bit 32, stores the ‘sign’ if it is a ‘1’  the number is negative, if it is  a ‘0’ it is positive.
The next 8 bits, bits 31-24 store the exponent. as we want  values between 2128  and 2-127, we store 2¹ as 10000000 (decimal 128), 2² as 10000001 (decimal 129), 2³ as  10000010 (decimal 130)  etc… The exponent thus follows from subtracting 127 from the decimal number that is stored in bits 31-24.

The fraction or mantissa is stored in bits 23-1. However, since we know that the fraction is always between 1 an 2, we do not store  the ‘1’ as we know it is always there. We refer to that as the ‘hidden’ bit, although it is not  hidden, it is just not stored. We use bits 23-1 to indicate a sum of  the fractions 1/2, 1/4, 1/8, 1/16 etc.
So, the binary storage of a float is as follows:

 sign exponent hidden fraction 20.5 0 10000011 01001000000000000000000 + 4(131-127) 1+ 1/4 +1/32 -7 1 10000001 11000000000000000000000 – 2(129-127) 1+ 1/2+1/4

## 4 thoughts on “The binary storage of floating point numbers”

1. Hugh says:

Actually a byte includes 8 bits, so the MSB is 2^7. An Arduino 16 bit integer has a MSB of 2^15.

1. Arduino says:

Hugh of course you are right, I must have had a black out. Too much in a hurry to get to the 32 bits float Tnx. It is corrected

2. Dilanka Danushka says:

hi i wrote some code for my project but i cant store decimal places please check what is the wrong

//code as bellow

#include
#include

#include
LiquidCrystal lcd(1, 4, 8, 9, 10, 13);
//char customKey;

float v1 = 0;
//int v2 = 0;
//int v3 = 0;

const byte ROWS = 4;
const byte COLS = 4;

byte rowPins = {A1, 11, A2, A4}; //Rows 0 to 3
byte colPins = {A3, A5 , 0, A0}; //Columns 0 to 3

char keys[ROWS][COLS] = {
{‘1’, ‘2’, ‘3’, ‘A’},
{‘4’, ‘5’, ‘6’, ‘B’},
{‘7’, ‘8’, ‘9’, ‘C’},
{‘.’, ‘0’, ‘#’, ‘D’}
};
Keypad kpd = Keypad( makeKeymap(keys), rowPins, colPins, 4, 4 );

void setup(){

lcd.begin(16,2);

lcd.clear();
}

int GetNumber()

{
int num = 0;
char key = kpd.getKey();
while(key != ‘#’)
{
switch (key)
{
case NO_KEY:
break;

case ‘0’: case ‘1’: case ‘2’: case ‘3’: case ‘4’:
case ‘5’: case ‘6’: case ‘7’: case ‘8’: case ‘9’:
case ‘.’:

lcd.print(key);
num = num * 10 + (key – ‘0’);
break;

case ‘C’:
num = 0;
lcd.clear();
break;
}

key = kpd.getKey();
}

return num;
}

void loop()
{

v1 = GetNumber();

lcd.setCursor(5,1);
lcd.print(v1);

}

1. E says:

Dilanka, I am not sure what your code is supposed to do, but your float is defined as “v1”. That subsequently is assigned a value from the procedure “GetNumber()”, but that is an integer and integers do not have a fraction

This site uses Akismet to reduce spam. Learn how your comment data is processed.