Binary, Bits, and Bytes
Computers do math differently than you do.
First, let's start with some terms:
- a number is a value which represents an amount, using digits to represent those values.
- A digit is one character (glyph) in a number.
For example, 724 is a number, and the "7" is a digit. The whole number 724 has three digits: 7, 2, and 4.
When we count, we use ten digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. There is no other digit after "9."
The question is, why ten digits? Why not 12, or 20, or 4?
The answer is believed to be that we have ten digits on our hands. In fact, the word digit comes from the word for "finger or thumb." Your right hand has five digits, as does your left hand (if you are normal).
Computers, however, only have two digits: 1 and 0, on and off. As a result, computers count differently than people.
Ever since you were a child, you used columns to represent numbers. This is because of the way number counting works.
If you only have ten digits, how do you represent a number above ten?
The answer is, you have to use digits which represent increasing values. For example, in the number 28, the "2" does not represent the value of "2." Instead, it represnts the value of 2 x 10. The "8," likewise, represents the value of 8 x 1.
Let's lay this out, using the number 5,362:
As you can see, each digit represents its own value multiplied by the value of the column.
As a result, the number 5,362 can more accurately be represented as:
( 5 * 1000 ) + ( 3 * 100 ) + ( 6 * 10 ) + ( 2 * 1 )
Normally, we don't think of it that way; we learned about all of this when we were children, and once we got used to the counting system, we forgot about it.
But Why Columns?
Columns are necessary if we want to use only a limited number of digits—otherwise we would have to create and remember millions of different symbols!
Instead, we use a system of counting where higher and higher values can be represented by the same digits. This system uses columns. Each new column represents the total amount that could be counted by the previous column.
So, how do we count? Here is the process:
As you can see, when we got to the number "9," we had reached the highest digit; we could not put a higher number after that with just one digit. Adding one made "ten," but we don't have a digit for that.
So, instead, we restarted: we put the "ten" in the next column, and started over again in the first column.
The second column represents the value of "10," which makes sense: every time you hit the highest digit, you have reached a "ten"; adding one to the tens column "restarts" the first digit, and allows you to start over again.
Because there are ten digits in our counting system, we call it base 10. Normally, when we write numbers, we assume it is base 10. However, when we use different base systems, there has to be a way of showing the base; this is done with a subscript.
For example, a number is base 10, such as 25, is shown as 2510; the number 47 in base 8 would be 478; the number 1101 in base 2 (binary) would be 11012.
The base number represents the amount counted in the second column.
Every additional column is an exponent of that base: 102, 103, 104, and so on. In more direct terms, 100, 1000, 10,000, and so forth. It looks like this:
It turns out that you can do the same thing with any number as the base! For example, here is base 8:
- The base number is equal to the number of digits
- The base number is always one higher than the highest digit
- All the column values are exponents of the base number
- The right-most column is always 1, because any number to the zeroeth power is 1! That is, x0 = 1.
- Any digit in a column is equal to that digit multiplied by the column value
Can you guess what number is displayed in the table? Remember, the number is the digits multiplied by the column values. In this case, it is (3 * 8) + (1 * 1), which is... 25! Or, more accurately, 318 = 2510.
What Is the Number Called?
We use certain names to describe certain numbers. Without thinking, we believe that these names are attached to the digits we see. That is not correct! The names we use are part of the base system we use.
For example, the number "ten"? That does not describe a 1 followed by a zero. Instead, it describes one more than nine. The word "ten" is only connected to the digits "10" in base 10. In other bases, it looks different!
In base 8, ten is written 12. In base 5, it is 20. In base 2, it is 1010. Or, more accurately, 128 = 205 = 10102 = 1010. All of those numbers are "ten."
So perhaps you can see that words like "ten" or "twelve" or "fifty" or a "thousand" are really confusing when you are talking about different bases. Therefore, when using bases other than 10, you must only use the names of the digits, and never the words used for base 10 numbers.
Therefore, the number 318 is not "thirty one," it is "three-one in base eight." If you use that naming system, you can avoid confision.
In addition, there are special names for bases that are often used:
- base 2: binary or BIN
- base 8: octal or OCT
- base 10: decimal or DEC
- base 16: hexadecimal or HEX
Why do programmers confuse Halloween and Christmas?
Because OCT 31 is the same as DEC 25.
Okay, Now for Binary
Sorry to take so long to finally get to binary counting!
In binary, there are only 2 digits: 0 and 1.
The highest digit is 1.
This means that you move to extra columns very quickly!
Here is the table:
Here's how counting goes in binary:
Look at that. 10 (one-zero) is two. 1010 (one-zero-one-zero) is ten!
Which brings up another joke:
There are only 10 kinds of people:
Those who understand binary, and those who do not.
To be fair, the first line of the joke should be: "There are only 102 kinds of people."
How to Translate between Binary and Decimal
To go from binary to decimal is actually rather easy: first, make a chart which has all the binary column (exponent) numbers. Easy to do: just double every new number. 1, 2, 4, 8, 16... and so forth.
Next, write the binary number in the spaces below. For example, if the binary number you have is 101101012:
Next, add up all the column numbers where there is a "1", and ignore all the column numbers where there is a "0." In this case:
That adds up to 181. So, 101101012 = 18110.
Now, the Other Way
How do we go from decimal to binary? How would we translate, for example, the number 15710 to binary?
It's a little more difficult, but not too hard once you get used to it.
Here's the method: write the binary exponent numbers, but this time vertically and from high to low:
Write down the decimal number you want to translate at the top left. Then subtract the binary exponent.
- If the binary exponent is smaller than your number, then subtract the exponent; that will be marked as a "1" in binary.
- If the binary exponent is too big to subtract, then do not subtract it; that will be marked as a "0."
Below is a chart showing how that works with the number 157:
Try these methods out with various numbers, and then test your answers with this binary-decimal translator web app:
We just learned how to count in binary (base 2). All the digits are 0s and 1s. These are called "binary digits," or "bits" for short. One bit is a "0" or a "1."
Now we should think about how many number combinations can be made with a certain number of digits. For example, let's say that you have a suitcase with a 3-digit number lock. How many combinations are there? Easy: 1000. You start with 000, the go up through 001, 002, 003, etc. until you reach 999. From 000 to 999 is 1000 combinations.
There is a simpler way to put it: the number of combinations is the base to the power of the number of digits, or based. If you are in base 10 and you have a 3-digit lock, then the number of combinations is 103, or 1000.
Base 2 Combinations
Now let's do the same thing in base 2. If you have 4 bits, how many different combinations (numbers) can you make? We're in base 2, so that would be 24, or 2 x 2 x 2 x 2, or 16.
We can test that by just counting from 0000 to 1111 and seeing how many numbers we make:
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = 8
1001 = 9
1010 = 10
1011 = 11
1100 = 12
1101 = 13
1110 = 14
1111 = 15
That is 0-15, for a total of 16 numbers!
Now that we know this system, we can see the combinations more easily:
|1 bit||21||2 combinations|
|2 bits||22||4 combinations|
|3 bits||23||8 combinations|
|4 bits||24||16 combinations|
|5 bits||25||32 combinations|
|6 bits||26||64 combinations|
|7 bits||27||128 combinations|
|8 bits||28||256 combinations|
Now that we know about combinations, we can look at what a Byte is. Basic definition: a Byte is 8 bits. For example, 10010110 is an eight-bit number, and it is a Byte.
The next question is, "Why eight bits?" Well, it has not always been 8 bits. Historically, there have been different sizes for Bytes. However, 8 is now the standard, and one good reason for that number has to do with typing.
Remember, a computer can only understand binary. So, what happens when you type the letter "M" on your keyboard? They computer does not know "M."
What happens is that the keyboard translates "M" into binary, specifically 01001101 (the number 77 in base 10). 01001101 is sent to the computer, which it can understand.
Think about this: how many letters and other characters do we need to give codes to? Let's see if we can count them up: 26 lowercase letters, 26 uppercase letter, 10 digits, maybe 14 punctuation marks (18 if you include "smart" single- and double-quotes), and a bunch of symbols... we're now perhaps at about 100 characters. But then there are a lot of special characters for non-English western languages, like the ñ in Spanish, or vowels with accents like é.
All in all, 256 combinations are enough to cover all of those. 256 combinations is 8 bits, meaning that 8 bits is a good amount for one Byte.
One code used to translate this is called ASCII, and some of the codes look like this:
|Character||ASCII Binary Code|
The ASCII numbers I have shown you are 8 bits. However, you will sometimes see ASCII codes represented as 7 bits, missing the initial "0." That's because ASCII is an older system, which used a different kind of Byte, a 7-bit Byte. Today, Bytes are 8-bits. An 8-bit Byte is also called an octet.
Keep in mind that ASCII is what is called a character set or character encoding.
A Mess of Text
One problem with computers is that there are dozens of different systems to translate text to binary code! ASCII is usually recognized as a historical base; Windows and Mac generally use the same ASCII codes for basic letters, numbers and symbols used on keyboards—but not exactly the same.
It gets worse: Mac and Windows use completely different codes for the non-ASCII characters. Mac OS X uses Mac OS Roman encoding, and Windows uses Windows-1252 encoding. More modern character encoding systems are even more complex, and there are so many variations that it is difficult to understand them!
However, there is hope: UTF-8 is a popular character encoding system widely used today. It is a system based on Unicode, a code which can represent almost any language. Any character, any symbol, any emoji can be expressed with Unicode, and with UTF-8. It is even compatible with ASCII.
B or b?
Now you know what a bit is, what a Byte is, and where they come from. Next, let's look at how they are used.
First, how they are written: bits are written as b (a small "b"), while Bytes are written as B (a capital "B").
Normally, bits are used to describe the speed of data transmission. For example, if you go to an ISP (Internet Service Provider) and get a connection to the Internet, you may ask, "How fast is it?" The ISP will answer you in bits per second, or bps. A common fiber-optic connection, for example, may be 100 Mbps, or 100 million bits per second.
Many people may mistake bps for Bps, but the two are very different. If you truly have a download speed of 100 million bits per second, that means you are getting 12.5 million Bytes per second—only 1/8th the speed you might think!
On the other hand, Bytes are used to describe an amount of data. For example, you might have a photograph which is 2 MB, or 2 million Bytes.
In everyday life, we almost always use Bytes. In the rare cases where we see "bits" used, we must translate. 1 Byte is 8 bits; 1 bit is 1/8th of a Byte.
Next, there are the prefixes used for describing large numbers. We do not usually say "a million Bytes"; instead, we say "megabyte," and we spell it "MB." Here are the different prefixes:
Generally, people do not know what these terms are until they start being used in personal computers. The first few, kilo and mega, had been known for a long time because there were used commonly—for example, a kilometer, or a megaton.
However, giga did not really become well-known until computer storage was big enough to hold a gigabyte, which was in the mid- to late-1990's.
These terms are often not used accurately, however; both are used simply to suggest something big. For example, these are two pizzas offered by Japanese pizza chains:
Pizza-la's "Mega Meat" pizza had nothing to do with "mega": there was not a "million" of anything on the pizza! Similarly, Domino's "Giga Meat" pizza neither had a billion pieces of meat, nor did it have 1,000 times the meat that the Mega Meat pizza had.
However, the use of both terms in the media and society in general continues.
More recently, after terabyte hard drives came out in the last decade, people started to hear the prefix tera, so that became widely known. However, it has not yet entered fully, as "tera" is not commonly used with English words to create the meaning of something extra-large.
Before the 1990's, when "giga" entered into common usage, people did not know what "giga" meant, and sometimes pronounced it as "jiga" ("jiga" is an acceptable pronunciation, but is rarely used today). For example, in the 1985 movie Back to the Future, Doc Brown needed to produce 1.21 gigawatts of electricity; Marty McFly, meanwhile, had no idea what that meant:
You might be wondering, where do these prefixes come from?
The longer-used prefixes, mega, giga, and tera, all come from Greek. "Mega" means "great" in Greek; "giga" means "giant," and "tera" means "monster."
The other prefixes are less poetic. "Peta" (penta) is from the greek word for "five," and "exa" is Greek for for "six." "Zetta" comes from Italian, meaning "seven," and "yotta" is also Italian, menaing "eight."
How Much Does a Byte Weigh?
Now you know what the words are. But do you understand what they mean? For example, how many songs fit in a gigabyte? If you want to store 30 minutes of video recorded on your cell phone, will a 4 GB USB flash unit be enough?
The answer is not completely easy, because not every book, photo, song, or movie is the same size. However, here is a rough estimate:
|Essay||15 KB||This might be a 1,500-word essay saved in .docx format.|
|Book||1 MB||The book would be plain text (no formatting, no images) and would be about the same as a 500-page paperback.|
|Photo||3 MB||Assuming an 8-megapixel image taken with an iPhone 5 and saved as a compressed JPG file.|
|Song||4.5 MB||This would be a 3-minute song saved in MP3 format at medium-high quality.|
|Personal Video||250 MB||Assuming a 2-minute video taken at Full HD resolution.|
|Movie||1.5 GB||Assuming a 120-minute movie at Full HD with strong H.246 compression|
From this chart, you can perhaps get a better idea of what the terms and amounts mean. For example, you could conclude that a 4 GB USB flash drive is just enough to hold half an hour of iPhone video. But it could also hold almost 900 songs, more than 1300 photos, about 4000 books, or millions of Essays!