Ever wonder why you can only store so much in a C++/Java/etc integer value? Well, I’ve got nothing else to blog about today, so here’s a random blog entry dedicated to 1’s and 0’s.
Binary: Base 2 Number System
First off I need a little background information. Since I’m providing background information at all, I may as well start from the beginning and explain what binary is. If you already know how binary works, just skip this section. Instead of a base 10 number system like our typical decimals, binary is a base 2 system. In our typical base 10 number system, you look at a number such as 142, and see that there are 1 hundred, 4 tens, and 2 ones. Or, written in another method,
(1 * 10^2) + (4 * 10^1) + (2 * 10^0) = 142 in decimal
(1 * 100) + (4 * 10) + (2 * 1) = 142 in decimal
100 + 40 + 2 = 142 in decimal
Now this works great for our usual day to day activities. Why do we use base 10 though? Well, look at your fingers. Count your fingers. Guess the reason yet? Someone long ago decided that base 10 was easy to work with, especially since we have 10 fingers. We could just as easily use base 8 systems, like octal. The octal number 216 ends up being,
(2 * 8^2) + (1 * 8^1) + (6 * 8^0)
(2 * 64) + (1 * 8) + (6 * 1)
128 + 8 + 6 = 142 in decimal
Just in case 8 fingered aliens someday invade and begin to communicate with math, people still learn to use octal. Now, the pattern here is this: A base n system contains a set of n symbols or values for each place. When you reach n, you move to the next place. If you don’t have any of that place, you use a neutral symbol, the 0 in our math system. To convert abc back to decimal, you do (a * n^2) + (b * n^1) + (c * n^0). What base is easiest to work with in a computer? How many symbols can we easily express? The easiest is 2. True or false, on or off, voltage or no voltage, light or dark, magnetized or not magnetized, zero or one. This is how data on every CD, DVD, floppy, hard drive, memory stick, and tape is stored. Base 2 systems look something like this,
(1 * 2^7) + (1 * 2^3) + (1 * 2^2) + (1 * 2^1)
128 + 8 + 4 + 2 = 142
So there you have it, to convert binary numbers back to decimal, just set up a little list of numbers like this, with a value for each bit of the binary number, and then figure out where the 1’s are. Like so,
128 64 32 16 8 4 2 1
1 1 1 1
128 + 8 + 4 + 2 = 142
Alright, enough boring intro to binary. Moving on.
Binary addition is just like addition in the base 10 system. In fact, because there are only three possible things to add when adding two numbers, you can add large binary numbers by hand faster than in decimal. What’s 7+5 again? Takes a bit more thinking than 0+0, 1+0, or 1+1. With a 1+1, you get a 2, which means you have to carry a 1 to the next digit. For a quick example, lets do 42+42, which should come to 84.
As you can see, it’s just like regular edition, except you carry if you get a 2 instead of a 10.
Well, what if we want negative numbers? Well a simple method would be to use an extra bit and store if the number is positive of negative. We use 1 for negative, 0 for positive. For example in a 4-bit system, 1011 would be -3, and 0011 would be positive 3. Now, this method has problems because of the complication resulting from addition and subtraction of these numbers. Just keep this method in your mind. Here’s some signed binary values to get the idea. Note you can only store 2 bits of values with 3 bits.
Now here’s where it gets interesting. Subtracting binary numbers is messy, especially if you used gates to do the same method we use for decimals (carrying from the next digit if you don’t have enough to subtract). Adding, on the other hand, is very easy and can be accomplished with just a handful of gates (around 5 to be exact, or just 2 XOR gates). Everything is based on addition. Subtraction is just the inverse, multiplication is just repeated addition, and division the inverse of that. So how do we subtract by adding? We use negative numbers. To do 10 – 7, we do 10 + (-7). Well, lets make an example with 42 – 42. What binary number can we add to 42 in order to get 0? Whatever this number is, it would have the value of -42, right? Well, we could try complementing every bit, putting 1’s in place of 0’s, and 0’s in place of 1’s, and then adding the two numbers.
|Complement of 42||0||1||0||1||0||1|
Well, if we complemented that, we would end up with 0. But that requires more inverters, and we want to stick with just the hardware of an adder. How can we make that 0 with just addition? Well, we could add 1. If we add 1, it carries across all the bits, and we end up with 100000000. That’s 0 if you ignore the first bit. We call this complementing and adding a 1 the two’s complement operation, and we will now use it to design a system that can store negative and positives numbers and easily do math on them.
In order to do math easily, we store numbers in two’s complement form. The leading bit will be used to determine sign.
Positive numbers: normal binary number with 0 as a leading bit.
Negative numbers: two’s complement operation on the binary number, and a 1 as the leading bit.
Lets see some examples. Lets say I want to express 53 (110101 in binary) in two’s complement form. I’ll need 6 bits to express 53, and an extra leading 0 to say that it’s positive: 0110101. Now, lets say I want to express -53. I complement the 53, leading to 001010. Then we add 1, 001011, and then add a 1 as a leading bit, leading to 1001011. Here are a few more examples,
-22 in two’s complement form,
10110 : number in binary
01001 : complemented
01010 : add 1, carry if needed
101010 : add a leading 1 to designate as negative.
42 in two’s complement form,
101010 : number in binary
0101010: two’s complement, just add leading 0 to designate positive number
-32 in two’s complement form,
100000 : number in binary
011111 : complemented
1011111: two’s complement form
Instead of adding the leading 1, you could also just add a leading 0 to start with, and then it will become a 1 when complemented.
Two’s Complement Addition and Subtraction
Let see how easy it is to add with two’s complement. Here is 22 + (-22),
We can ignore the leading 1, since we’re only using a 6 bit system, and call this 0. Remember, the entire concept of two’s complement binary is to represent negative binary numbers in such a way that they can be added just like any other number, and come up with the correct result. Lets do something like 41-12. In this case, we complement the 12 to get -12, and then add it to 42.
Notice the extra carry that I didn’t include, since we ignore this bit. If we ignore extra bits, how do we detect overflows?
Lets refer back to the possible 4 bit two’s complement values.
Now, the range for using 4 bits is -8 <= number <= +7. When the result of our addition is a number not in this range, we’ll get an incorrect result. How do we tell this when we ignore the first bit though? Well, there are a few simple rules that work because of the nature of carrying a 1.
Positive number + positive number: result must be positive. Overflow occurs if result is negative.
As you can see, the number looks like -7 because a 1 carried into the bit that determins sign of a number, and it can’t be expressed without 5 bits. The real result, 9, is out of range.
Negative number + negative number: result must be negative. Overflow occurs if result is positive.
Once again, the sign bit is not correctly filled in because a 1 carries over to the bit that is ignored. Instead of -9, we end up with +7.
Positive number + negative number: never produces overflow. The magnitude of the sum is somewhere between the magnitudeof the two operands, which are of course within the range.
So the conclusion to overflow detection is this. For signed numbers, the carry of the high-order bit is ignored, and overflow occurs if the addition proccess operates on two operands of the same sign, and results in a result of the opposite sign. For unsigned integers, just plane old binary, overflow occurs when carry of the high-order bit ocurrs.
Whew, still there? I wasn’t planning on making this entry so long… kinda got carried away. But now, and only now, can I answer the question I posited in the beginning of this post. Why can an integer in your language of choice only store so much? Well, first off you get assigned some number of bits for your integer. For example, we’ll use Java’s 32 bit integer size. Going back to what is binary, you should be able to store 2^32 values, or 4,294,967,296 values. Now, one value is 0, so we need to subtract one from that to get the max value to store. Therefore, we can store values from 0 to 4,294,967,295. However, in Java, and most other languages, integers are signed, and stored in two’s complement representation. This means your first bit is used to determine if a number is positive or negative. Now, because positive numbers are just stored in normal binary with the leading bit just a 0, we can store the values 0 to (2^31 – 1), or 0 to 2,147,483,647. Now, for negative numbers we don’t need to remove a value for 0, since the biggest negative number would instead be all 1’s. Go look back at the 4-bit two’s complement circle to see what I mean. The largest value you can store is -2^31, or -2147,483,648. So there you have it, 32 to bit Java integer,
For an n-bit signed integer, stored using two’s complement, the min and max can be calculated using,
Mac: 2^(n-1) -1
There you have it, more about binary than you ever wanted to know, unless you’re playing with logic gates or programming in assembly. Maybe next week I’ll explain how to store doubles! Well.. maybe not, this was too much typing.