### Binary, Bits, and Bytes

Computers do math differently than you do.

First, let's start with some terms:

- a
**number**is a value which represents an amount, using digits to represent those values. - A
**digit**is one character (glyph) in a number.

For example, **724** is a **number**, and the "**7**" is a **digit**. The whole number **724** has three digits: 7, 2, and 4.

When we count, we use **ten digits**: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. There is no other digit after "9."

The question is, why ten digits? Why not 12, or 20, or 4?

The answer is believed to be that we have ten digits on our hands. In fact, the word **digit** comes from the word for "finger or thumb." Your right hand has five digits, as does your left hand (if you are normal).

Computers, however, only have **two digits**: 1 and 0, on and off. As a result, computers count differently than people.

#### Using Columns

Ever since you were a child, you used columns to represent numbers. This is because of the way number counting works.

If you only have ten digits, how do you represent a number above ten?

The answer is, you have to use digits which represent increasing values. For example, in the number **28**, the "2" does **not** represent the value of "2." Instead, it represnts the value of **2 x 10**. The "8," likewise, represents the value of **8 x 1**.

Let's lay this out, using the number **5,362**:

millions | hundred thousands | ten thousands | thousands | hundreds | tens |
ones |
---|---|---|---|---|---|---|

0 | 0 | 0 | 0 | 0 | 0 | 0 |

1 | 1 | 1 | 1 | 1 | 1 | 1 |

2 | 2 | 2 | 2 | 2 | 2 | 2 |

3 | 3 | 3 | 3 | 3 | 3 | 3 |

4 | 4 | 4 | 4 | 4 | 4 | 4 |

5 | 5 | 5 | 5 | 5 | 5 | 5 |

6 | 6 | 6 | 6 | 6 | 6 | 6 |

7 | 7 | 7 | 7 | 7 | 7 | 7 |

8 | 8 | 8 | 8 | 8 | 8 | 8 |

9 | 9 | 9 | 9 | 9 | 9 | 9 |

As you can see, each **digit** represents its own value **multiplied by** the value of the column.

As a result, the number 5,362 can more accurately be represented as:

( 5 * 1000 ) + ( 3 * 100 ) + ( 6 * 10 ) + ( 2 * 1 )

Normally, we don't think of it that way; we learned about all of this when we were children, and once we got used to the counting system, we forgot about it.

#### But Why Columns?

Columns are necessary if we want to use only a limited number of digits—otherwise we would have to create and remember millions of different symbols!

Instead, we use a system of counting where higher and higher values can be represented by the same digits. This system uses columns. Each new column represents the total amount that could be counted by the previous column.

So, **how do we count?** Here is the process:

As you can see, when we got to the number "9," we had reached the highest digit; we could not put a higher number after that with just one digit. Adding one made "ten," but we don't have a digit for that.

So, instead, we restarted: we put the "ten" in the next column, and started over again in the first column.

**The second column represents the value of "10,"** which makes sense: every time you hit the highest digit, you have reached a "ten"; adding one to the tens column "restarts" the first digit, and allows you to start over again.

#### Bases

Because there are ten digits in our counting system, we call it **base 10**. Normally, when we write numbers, we assume it is base 10. However, when we use different base systems, there has to be a way of showing the base; this is done with a **subscript**.

For example, a number is base 10, such as 25, is shown as 25_{10}; the number 47 in base 8 would be 47_{8}; the number 1101 in base 2 (binary) would be 1101_{2}.

The base number represents the amount counted in the **second column**.

Every additional column is an exponent of that base: 10^{2}, 10^{3}, 10^{4}, and so on. In more direct terms, 100, 1000, 10,000, and so forth. It looks like this:

millions | hundred thousands | ten thousands | thousands | hundreds | tens [BASE] |
ones |
---|---|---|---|---|---|---|

1,000,000 | 100,000 | 10,000 | 1,000 | 100 | 10 | 1 |

10^{6} |
10^{5} |
10^{4} |
10^{3} |
10^{2} |
10^{1} |
10^{0} |

0 | 0 | 0 | 0 | 0 | 0 | 0 |

1 | 1 | 1 | 1 | 1 | 1 | 1 |

2 | 2 | 2 | 2 | 2 | 2 | 2 |

3 | 3 | 3 | 3 | 3 | 3 | 3 |

4 | 4 | 4 | 4 | 4 | 4 | 4 |

5 | 5 | 5 | 5 | 5 | 5 | 5 |

6 | 6 | 6 | 6 | 6 | 6 | 6 |

7 | 7 | 7 | 7 | 7 | 7 | 7 |

8 | 8 | 8 | 8 | 8 | 8 | 8 |

9 | 9 | 9 | 9 | 9 | 9 | 9 |

It turns out that you can do the same thing with any number as the base! For example, here is base 8:

262,144 | 32,768 | 4096 | 512 | 64 | 8 [BASE] |
1 |

8^{6} |
8^{5} |
8^{4} |
8^{3} |
8^{2} |
8^{1} |
8^{0} |

0 | 0 | 0 | 0 | 0 | 0 | 0 |

1 | 1 | 1 | 1 | 1 | 1 | 1 |

2 | 2 | 2 | 2 | 2 | 2 | 2 |

3 | 3 | 3 | 3 | 3 | 3 | 3 |

4 | 4 | 4 | 4 | 4 | 4 | 4 |

5 | 5 | 5 | 5 | 5 | 5 | 5 |

6 | 6 | 6 | 6 | 6 | 6 | 6 |

7 | 7 | 7 | 7 | 7 | 7 | 7 |

Notice that:

- The base number is equal to the number of digits
- The base number is always one higher than the highest digit
- All the column values are exponents of the base number
- The right-most column is always 1, because
**any**number to the zeroeth power is 1! That is, x^{0}= 1. - Any digit in a column is equal to that digit multiplied by the column value

Can you guess what number is displayed in the table? Remember, the number is the digits multiplied by the column values. In this case, it is **(3 * 8) + (1 * 1)**, which is... 25! Or, more accurately, 31_{8} = 25_{10}.

#### What Is the Number Called?

We use certain names to describe certain numbers. Without thinking, we believe that these names are attached to the digits we see. *That is not correct!* The names we use are part of the base system we use.

For example, the number "ten"? That does **not** describe a 1 followed by a zero. Instead, it describes one more than nine. The word "ten" is **only** connected to the digits "10" in base 10. In other bases, it looks different!

In base 8, ten is written 12. In base 5, it is 20. In base 2, it is 1010. Or, more accurately, 12_{8} = 20_{5} = 1010_{2} = 10_{10}. **All of those numbers are "ten."**

So perhaps you can see that words like "ten" or "twelve" or "fifty" or a "thousand" are really confusing when you are talking about different bases. Therefore, **when using bases other than 10, you must only use the names of the digits, and never the words used for base 10 numbers.**

Therefore, the number 31_{8} is not "thirty one," it is "three-one in base eight." If you use that naming system, you can avoid confision.

In addition, there are special names for bases that are often used:

- base 2:
**binary**or BIN - base 8:
**octal**or OCT - base 10:
**decimal**or DEC - base 16:
**hexadecimal**or HEX

Why do programmers confuse Halloween and Christmas?

Because OCT 31 is the same as DEC 25.

#### Okay, Now for Binary

Sorry to take so long to finally get to binary counting!

In binary, there are only 2 digits: 0 and 1.

The highest digit is 1.

This means that you move to extra columns very quickly!

Here is the table:

sixty-fours | thirty-twos | sixteens | eights | fours | twos [BASE] |
ones |
---|---|---|---|---|---|---|

64 | 32 | 16 | 8 | 4 | 2 | 1 |

2^{6} |
2^{5} |
2^{4} |
2^{3} |
2^{2} |
2^{1} |
2^{0} |

0 | 0 | 0 | 0 | 0 | 0 | 0 |

1 | 1 | 1 | 1 | 1 | 1 | 1 |

Here's how counting goes in binary:

Look at that. 10 (one-zero) is two. 1010 (one-zero-one-zero) is ten!

Which brings up another joke:

There are only 10 kinds of people:

Those who understand binary, and those who do not.

To be fair, the first line of the joke should be: "There are only 10_{2} kinds of people."

#### How to Translate between Binary and Decimal

To go from binary to decimal is actually rather easy: first, make a chart which has all the binary column (exponent) numbers. Easy to do: just double every new number. 1, 2, 4, 8, 16... and so forth.

Next, write the binary number in the spaces below. For example, if the binary number you have is 10110101_{2}:

Next, add up all the column numbers where there is a "1", and ignore all the column numbers where there is a "0." In this case:

128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |

1 |
0 | 1 |
1 |
0 | 1 |
0 | 1 |

128 + 32 + 16 + 4 + 1

That adds up to 181. So, 10110101_{2} = 181_{10}.

#### Now, the Other Way

How do we go from decimal to binary? How would we translate, for example, the number 157_{10} to binary?

It's a little more difficult, but not too hard once you get used to it.

Here's the method: write the binary exponent numbers, but this time vertically and from high to low:

Write down the decimal number you want to translate at the top left. Then subtract the binary exponent.

**If the binary exponent is smaller than your number, then subtract the exponent**; that will be marked as a "1" in binary.**If the binary exponent is too big to subtract, then do not subtract it**; that will be marked as a "0."

Below is a chart showing how that works with the number 157:

Number - Exponent | Can Subtract? | Leaves how much? | BIN |

157 - 128? | Yes | Leaves 29 | 1 |

29 - 64? | No | -- | 0 |

29 - 32? | No | -- | 0 |

29 - 16? | Yes | Leaves 13 | 1 |

13 - 8? | Yes | Leaves 5 | 1 |

5 - 4? | Yes | Leaves 1 | 1 |

1 - 2? | No | -- | 0 |

1 - 1? | Yes | Leaves 0 | 1 |

Try these methods out with various numbers, and then test your answers with this binary-decimal translator web app:

##### You Have Seen These Numbers Before

Look at the numbers we get for the powers of "2": 1, 2, 4, 8, 16, 32, 64, 128, 256, 512. Do they look familiar? They should—**you see them all the time in the computer world**. If you have a USB flash memory stick, get it out and read the capacity (size). It will have one of those numbers (e.g., 512 MB, 4 GB, 16 GB). You may have heard of 32-bit or 64-bit operating systems, or your computer may have 8 GB of RAM. These numbers are often used in computers, especially to describe amounts of memory (not storage).

We just learned how to count in binary (base 2). All the digits are 0s and 1s. These are called "**bi**nary digi**ts**," or "bits" for short. One **bit** is a "0" or a "1."

#### Combinations

Now we should think about how many number combinations can be made with a certain number of digits. For example, let's say that you have a suitcase with a 3-digit number lock. How many combinations are there? Easy: 1000. You start with 000, the go up through 001, 002, 003, etc. until you reach 999. From 000 to 999 is 1000 combinations.

There is a simpler way to put it: the number of combinations is the base to the power of the number of digits, or **base ^{d}**. If you are in base 10 and you have a 3-digit lock, then the number of combinations is

**10**, or 1000.

^{3}##### Base 2 Combinations

Now let's do the same thing in base 2. If you have 4 bits, how many different combinations (numbers) can you make? We're in base 2, so that would be **2 ^{4}**, or 2 x 2 x 2 x 2, or 16.

We can test that by just counting from 0000 to 1111 and seeing how many numbers we make:

0000 = 0

0001 = 1

0010 = 2

0011 = 3

0100 = 4

0101 = 5

0110 = 6

0111 = 7

1000 = 8

1001 = 9

1010 = 10

1011 = 11

1100 = 12

1101 = 13

1110 = 14

1111 = 15

That is 0-15, for a total of 16 numbers!

Now that we know this system, we can see the combinations more easily:

1 bit | 2^{1} |
2 combinations |

2 bits | 2^{2} |
4 combinations |

3 bits | 2^{3} |
8 combinations |

4 bits | 2^{4} |
16 combinations |

5 bits | 2^{5} |
32 combinations |

6 bits | 2^{6} |
64 combinations |

7 bits | 2^{7} |
128 combinations |

8 bits | 2^{8} |
256 combinations |

### The Byte

Now that we know about combinations, we can look at what a **Byte** is. Basic definition: **a Byte is 8 bits**. For example, 10010110 is an eight-bit number, and it is a Byte.

The next question is, "Why eight bits?" Well, it has not always been 8 bits. Historically, there have been different sizes for Bytes. However, 8 is now the standard, and one good reason for that number has to do with **typing**.

Remember, a computer can only understand binary. So, what happens when you type the letter "M" on your keyboard? They computer does not know "M."

What happens is that the keyboard translates "M" into binary, specifically 01001101 (the number 77 in base 10). 01001101 is sent to the computer, which it can understand.

##### Translating Characters

Think about this: how many letters and other characters do we need to give codes to? Let's see if we can count them up: **26** lowercase letters, **26** uppercase letter, **10** digits, maybe **14** punctuation marks (18 if you include "smart" single- and double-quotes), and a bunch of symbols... we're now perhaps at about 100 characters. But then there are a lot of special characters for non-English western languages, like the ñ in Spanish, or vowels with accents like é.

All in all, 256 combinations are enough to cover all of those. 256 combinations is 8 bits, meaning that 8 bits is a good amount for one Byte.

One code used to translate this is called ASCII, and some of the codes look like this:

Character | ASCII Binary Code |

A | 0100 0001 |

B | 0100 0010 |

C | 0100 0011 |

D | 0100 0100 |

a | 0110 0001 |

b | 0110 0010 |

c | 0110 0011 |

d | 0110 0100 |

The ASCII numbers I have shown you are **8 bits**. However, you will sometimes see ASCII codes represented as 7 bits, missing the initial "0." That's because ASCII is an older system, which used a different kind of Byte, a 7-bit Byte. Today, Bytes are 8-bits. An 8-bit Byte is also called an **octet**.

Keep in mind that ASCII is what is called a **character set** or **character encoding**.

#### A Mess of Text

One problem with computers is that there are dozens of different systems to translate text to binary code! ASCII is usually recognized as a historical base; Windows and Mac generally use the same ASCII codes for basic letters, numbers and symbols used on keyboards—but not *exactly* the same.

It gets worse: Mac and Windows use completely different codes for the non-ASCII characters. Mac OS X uses Mac OS Roman encoding, and Windows uses Windows-1252 encoding. More modern **character encoding** systems are even more complex, and there are so many variations that it is difficult to understand them!

However, there is hope: **UTF-8** is a popular character encoding system widely used today. It is a system based on **Unicode**, a code which can represent almost any language. Any character, any symbol, any *emoji* can be expressed with Unicode, and with UTF-8. It is even compatible with ASCII.

Have you ever seen a browser page that looks like this:

When you see that, you have to go to the "View" menu of your browser, and set the correct text (character) encoding. If you do it right, the page will clear up:

#### What Is That?

In Japan, it is called "* mojibake*"; there is no common English term.

*Mojibake*happens when the wrong character set is used to display a web page. For example, the page shown above uses the "Shift-JIS" encoding system, but is displayed as

*mojibake*when viewed with the Mac OS Roman character set.

There is another reason as well. Take a look at one small example of text from the two versions of the page shown above:

That is the exact same text, first clearly, then as *mojibake*. Notice that the Japanese version is 3 characters: ホーム; then notice that the *mojibake* version is 6 characters: ÉzÅ[ÉÄ.

That is not a coincidence. Remember, western encoding systems use 1 Byte, with 256 combinations, to encode text characters. Japanese, like other Asian languages, have far more than 256 characters! Joyo kanji alone has 2136 characters. 1 Byte is not enough.

As a result, Japanese must be encoded with more than one Byte. The Shift-JIS is a double-byte character system, meaning that 2 Bytes are used for each character. 2 Bytes is 16 bits; 16 bits has 65,536 combinations, enough for all kanji and a lot more.

As it happens, the character for ホ has the Shift-JIS code 1000001101111010.

However, when you switch to Mac OS Roman, your computer is looking for 1-Byte characters, so it splits 1000001101111010 into 10000011 and 01111010. Those two characters are—you guessed it— "É" and "z."

#### B or b?

Now you know what a **bit** is, what a **Byte** is, and where they come from. Next, let's look at how they are used.

First, how they are written: bits are written as **b** (a small "b"), while Bytes are written as **B** (a capital "B").

Normally, **bits are used to describe the speed of data transmission**. For example, if you go to an ISP (Internet Service Provider) and get a connection to the Internet, you may ask, "How fast is it?" The ISP will answer you in *bits per second*, or **bps**. A common fiber-optic connection, for example, may be **100 Mbps**, or 100 million bits per second.

Many people may mistake **bps** for **Bps**, but the two are very different. If you truly have a download speed of 100 million bits per second, that means you are getting 12.5 million **Bytes** per second—only 1/8th the speed you might think!

On the other hand, **Bytes are used to describe an amount of data**. For example, you might have a photograph which is 2 MB, or 2 million Bytes.

In everyday life, we almost always use Bytes. In the rare cases where we see "bits" used, we must translate. 1 Byte is 8 bits; 1 bit is 1/8th of a Byte.

#### Going Metric

Next, there are the prefixes used for describing large numbers. We do not usually say "a million Bytes"; instead, we say "megabyte," and we spell it "MB." Here are the different prefixes:

Prefix | Term | Abbreviation | Metric Bytes |
---|---|---|---|

Byte | B | 1 | |

Kilo | Kilobyte | KB | 1,000 |

Mega | Megabyte | MB | 1,000,000 |

Giga | Gigabyte | GB | 1,000,000,000 |

Tera | Terabyte | TB | 1,000,000,000,000 |

Peta | Petabyte | PB | 1,000,000,000,000,000 |

Exa | Exabyte | EB | 1,000,000,000,000,000,000 |

Zetta | Zettabyte | ZB | 1,000,000,000,000,000,000,000 |

Yotta | Yottabyte | YB | 1,000,000,000,000,000,000,000,000 |

Generally, people do not know what these terms are until they start being used in personal computers. The first few, kilo and mega, had been known for a long time because there were used commonly—for example, a **kilo**meter, or a **mega**ton.

However, **giga** did not really become well-known until computer storage was big enough to hold a gigabyte, which was in the mid- to late-1990's.

These terms are often not used accurately, however; both are used simply to suggest something big. For example, these are two pizzas offered by Japanese pizza chains:

Pizza-la's "Mega Meat" pizza had nothing to do with "mega": there was not a "million" of anything on the pizza! Similarly, Domino's "Giga Meat" pizza neither had a billion pieces of meat, nor did it have 1,000 times the meat that the Mega Meat pizza had.

However, the use of both terms in the media and society in general continues.

More recently, after terabyte hard drives came out in the last decade, people started to hear the prefix **tera**, so that became widely known. However, it has not yet entered fully, as "tera" is not commonly used with English words to create the meaning of something extra-large.

Before the 1990's, when "giga" entered into common usage, people did not know what "giga" meant, and sometimes pronounced it as "jiga" ("jiga" is an acceptable pronunciation, but is rarely used today). For example, in the 1985 movie *Back to the Future*, Doc Brown needed to produce 1.21 gigawatts of electricity; Marty McFly, meanwhile, had no idea what that meant:

You might be wondering, where do these prefixes come from?

The longer-used prefixes, mega, giga, and tera, all come from Greek. "Mega" means "great" in Greek; "giga" means "giant," and "tera" means "monster."

The other prefixes are less poetic. "Peta" (penta) is from the greek word for "five," and "exa" is Greek for for "six." "Zetta" comes from Italian, meaning "seven," and "yotta" is also Italian, menaing "eight."

#### How Much Does a Byte Weigh?

Now you know what the words are. But do you understand what they *mean*? For example, how many songs fit in a gigabyte? If you want to store 30 minutes of video recorded on your cell phone, will a 4 GB USB flash unit be enough?

The answer is not completely easy, because not every book, photo, song, or movie is the same size. However, here is a rough estimate:

Item | Size | Notes |
---|---|---|

Essay | 15 KB | This might be a 1,500-word essay saved in .docx format. |

Book | 1 MB | The book would be plain text (no formatting, no images) and would be about the same as a 500-page paperback. |

Photo | 3 MB | Assuming an 8-megapixel image taken with an iPhone 5 and saved as a compressed JPG file. |

Song | 4.5 MB | This would be a 3-minute song saved in MP3 format at medium-high quality. |

Personal Video | 250 MB | Assuming a 2-minute video taken at Full HD resolution. |

Movie | 1.5 GB | Assuming a 120-minute movie at Full HD with strong H.246 compression |

From this chart, you can perhaps get a better idea of what the terms and amounts mean. For example, you could conclude that a 4 GB USB flash drive is *just* enough to hold half an hour of iPhone video. But it could also hold almost 900 songs, more than 1300 photos, about 4000 books, or millions of Essays!