CPS100 • INTRODUCTION TO COMPUTERS


SUMMER 2018 Semester • LAKELAND UNIVERSITY

Binary, Bits, and Bytes

Computers do math differently than you do.

First, let's start with some terms:

  • a number is a value which represents an amount, using digits to represent those values.
  • A digit is one character (glyph) in a number.

For example, 724 is a number, and the "7" is a digit. The whole number 724 has three digits: 7, 2, and 4.

When we count, we use ten digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. There is no other digit after "9."

The question is, why ten digits? Why not 12, or 20, or 4?

The answer is believed to be that we have ten digits on our hands. In fact, the word digit comes from the word for "finger or thumb." Your right hand has five digits, as does your left hand (if you are normal).

Computers, however, only have two digits: 1 and 0, on and off. As a result, computers count differently than people.

Binary: Base 2 Counting


Using Columns

Ever since you were a child, you used columns to represent numbers. This is because of the way number counting works.

If you only have ten digits, how do you represent a number above ten?

The answer is, you have to use digits which represent increasing values. For example, in the number 28, the "2" does not represent the value of "2." Instead, it represnts the value of 2 x 10. The "8," likewise, represents the value of 8 x 1.

Let's lay this out, using the number 5,362:

millions hundred thousands ten thousands thousands hundreds tens
ones
0 0 0 0 0 0 0
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
6 6 6 6 6 6 6
7 7 7 7 7 7 7
8 8 8 8 8 8 8
9 9 9 9 9 9 9

As you can see, each digit represents its own value multiplied by the value of the column.

As a result, the number 5,362 can more accurately be represented as:

( 5 * 1000 ) + ( 3 * 100 ) + ( 6 * 10 ) + ( 2 * 1 )

Normally, we don't think of it that way; we learned about all of this when we were children, and once we got used to the counting system, we forgot about it.

But Why Columns?

Columns are necessary if we want to use only a limited number of digits—otherwise we would have to create and remember millions of different symbols!

Instead, we use a system of counting where higher and higher values can be represented by the same digits. This system uses columns. Each new column represents the total amount that could be counted by the previous column.

So, how do we count? Here is the process:

As you can see, when we got to the number "9," we had reached the highest digit; we could not put a higher number after that with just one digit. Adding one made "ten," but we don't have a digit for that.

So, instead, we restarted: we put the "ten" in the next column, and started over again in the first column.

The second column represents the value of "10," which makes sense: every time you hit the highest digit, you have reached a "ten"; adding one to the tens column "restarts" the first digit, and allows you to start over again.

Bases

Because there are ten digits in our counting system, we call it base 10. Normally, when we write numbers, we assume it is base 10. However, when we use different base systems, there has to be a way of showing the base; this is done with a subscript.

For example, a number is base 10, such as 25, is shown as 2510; the number 47 in base 8 would be 478; the number 1101 in base 2 (binary) would be 11012.

The base number represents the amount counted in the second column.

Every additional column is an exponent of that base: 102, 103, 104, and so on. In more direct terms, 100, 1000, 10,000, and so forth. It looks like this:

millions hundred thousands ten thousands thousands hundreds tens
[BASE]
ones
1,000,000 100,000 10,000 1,000 100 10 1
106 105 104 103 102 101 100
0 0 0 0 0 0 0
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
6 6 6 6 6 6 6
7 7 7 7 7 7 7
8 8 8 8 8 8 8
9 9 9 9 9 9 9

It turns out that you can do the same thing with any number as the base! For example, here is base 8:

262,144 32,768 4096 512 64 8
[BASE]
1
86 85 84 83 82 81 80
0 0 0 0 0 0 0
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
6 6 6 6 6 6 6
7 7 7 7 7 7 7

Notice that:

  • The base number is equal to the number of digits
  • The base number is always one higher than the highest digit
  • All the column values are exponents of the base number
  • The right-most column is always 1, because any number to the zeroeth power is 1! That is, x0 = 1.
  • Any digit in a column is equal to that digit multiplied by the column value

Can you guess what number is displayed in the table? Remember, the number is the digits multiplied by the column values. In this case, it is (3 * 8) + (1 * 1), which is... 25! Or, more accurately, 318 = 2510.

What Is the Number Called?

We use certain names to describe certain numbers. Without thinking, we believe that these names are attached to the digits we see. That is not correct! The names we use are part of the base system we use.

For example, the number "ten"? That does not describe a 1 followed by a zero. Instead, it describes one more than nine. The word "ten" is only connected to the digits "10" in base 10. In other bases, it looks different!

In base 8, ten is written 12. In base 5, it is 20. In base 2, it is 1010. Or, more accurately, 128 = 205 = 10102 = 1010.  All of those numbers are "ten."

So perhaps you can see that words like "ten" or "twelve" or "fifty" or a "thousand" are really confusing when you are talking about different bases. Therefore, when using bases other than 10, you must only use the names of the digits, and never the words used for base 10 numbers.

Therefore, the number 318 is not "thirty one," it is "three-one in base eight." If you use that naming system, you can avoid confision.

In addition, there are special names for bases that are often used:

  • base 2:     binary or BIN
  • base 8:     octal or OCT
  • base 10:  decimal or DEC
  • base 16:  hexadecimal or HEX
Programmer's Joke

Why do programmers confuse Halloween and Christmas?

Because OCT 31 is the same as DEC 25.

Okay, Now for Binary

Sorry to take so long to finally get to binary counting!

In binary, there are only 2 digits: 0 and 1.

The highest digit is 1.

This means that you move to extra columns very quickly!

Here is the table:

sixty-fours thirty-twos sixteens eights fours twos
[BASE]
ones
64 32 16 8 4 2 1
26 25 24 23 22 21 20
0 0 0 0 0 0 0
1 1 1 1 1 1 1

Here's how counting goes in binary:


Look at that. 10 (one-zero) is two. 1010 (one-zero-one-zero) is ten!

Which brings up another joke:

Programmer's Joke

There are only 10 kinds of people:

Those who understand binary, and those who do not.

To be fair, the first line of the joke should be: "There are only 102 kinds of people."


How to Translate between Binary and Decimal

To go from binary to decimal is actually rather easy: first, make a chart which has all the binary column (exponent) numbers. Easy to do: just double every new number. 1, 2, 4, 8, 16... and so forth.

Next, write the binary number in the spaces below. For example, if the binary number you have is 101101012:

Next, add up all the column numbers where there is a "1", and ignore all the column numbers where there is a "0." In this case:

128 64 32  16  8  4  2  1
 1  0  1  1  0  1  0  1

128 + 32 + 16 + 4 + 1

That adds up to 181. So, 101101012 = 18110.


Now, the Other Way

How do we go from decimal to binary? How would we translate, for example, the number 15710 to binary?

It's a little more difficult, but not too hard once you get used to it.

Here's the method: write the binary exponent numbers, but this time vertically and from high to low:

Write down the decimal number you want to translate at the top left. Then subtract the binary exponent.

  • If the binary exponent is smaller than your number, then subtract the exponent; that will be marked as a "1" in binary.
  • If the binary exponent is too big to subtract, then do not subtract it; that will be marked as a "0."

Below is a chart showing how that works with the number 157:

Number - Exponent Can Subtract? Leaves how much? BIN
157 - 128? Yes Leaves 29 1
29 - 64? No -- 0
29 - 32? No -- 0
29 - 16? Yes Leaves 13 1
13 - 8? Yes Leaves 5 1
5 - 4? Yes Leaves 1 1
1 - 2? No -- 0
1 - 1? Yes Leaves 0 1

Try these methods out with various numbers, and then test your answers with this binary-decimal translator web app:



Tech in Your Life

You Have Seen These Numbers Before

Look at the numbers we get for the powers of "2": 1, 2, 4, 8, 16, 32, 64, 128, 256, 512. Do they look familiar? They should—you see them all the time in the computer world. If you have a USB flash memory stick, get it out and read the capacity (size). It will have one of those numbers (e.g., 512 MB, 4 GB, 16 GB). You may have heard of 32-bit or 64-bit operating systems, or your computer may have 8 GB of RAM. These numbers are often used in computers, especially to describe amounts of memory (not storage).


Bits & Bytes


We just learned how to count in binary (base 2). All the digits are 0s and 1s. These are called "binary digits," or "bits" for short. One bit is a "0" or a "1."


A 3-digit number lock
3-digit Number Lock

Combinations

Now we should think about how many number combinations can be made with a certain number of digits. For example, let's say that you have a suitcase with a 3-digit number lock. How many combinations are there? Easy: 1000. You start with 000, the go up through 001, 002, 003, etc. until you reach 999. From 000 to 999 is 1000 combinations.

There is a simpler way to put it: the number of combinations is the base to the power of the number of digits, or based. If you are in base 10 and you have a 3-digit lock, then the number of combinations is 103, or 1000.

Base 2 Combinations

Now let's do the same thing in base 2. If you have 4 bits, how many different combinations (numbers) can you make? We're in base 2, so that would be 24, or 2 x 2 x 2 x 2, or 16.

We can test that by just counting from 0000 to 1111 and seeing how many numbers we make:

0000   =   0
0001   =   1
0010   =   2
0011   =   3
0100   =   4
0101   =   5
0110   =   6
0111   =   7
1000   =   8
1001   =   9
1010   =   10
1011   =   11
1100   =   12
1101   =   13
1110   =   14
1111   =   15

That is 0-15, for a total of 16 numbers!

Now that we know this system, we can see the combinations more easily:

1 bit 21 2 combinations
2 bits 22 4 combinations
3 bits 23 8 combinations
4 bits 24 16 combinations
5 bits 25 32 combinations
6 bits 26 64 combinations
7 bits 27 128 combinations
8 bits 28 256 combinations

The Byte

Now that we know about combinations, we can look at what a Byte is. Basic definition: a Byte is 8 bits. For example, 10010110 is an eight-bit number, and it is a Byte.

The next question is, "Why eight bits?" Well, it has not always been 8 bits. Historically, there have been different sizes for Bytes. However, 8 is now the standard, and one good reason for that number has to do with typing.

Remember, a computer can only understand binary. So, what happens when you type the letter "M" on your keyboard? They computer does not know "M."

What happens is that the keyboard translates "M" into binary, specifically 01001101 (the number 77 in base 10). 01001101 is sent to the computer, which it can understand.

Translating Characters

Think about this: how many letters and other characters do we need to give codes to? Let's see if we can count them up: 26 lowercase letters, 26 uppercase letter, 10 digits, maybe 14 punctuation marks (18 if you include "smart" single- and double-quotes), and a bunch of symbols... we're now perhaps at about 100 characters. But then there are a lot of special characters for non-English western languages, like the ñ in Spanish, or vowels with accents like é.

All in all, 256 combinations are enough to cover all of those. 256 combinations is 8 bits, meaning that 8 bits is a good amount for one Byte.

One code used to translate this is called ASCII, and some of the codes look like this:

 Character ASCII Binary Code
A 0100 0001
B 0100 0010
C 0100 0011
D 0100 0100
a 0110 0001
b 0110 0010
c 0110 0011
d 0110 0100

The ASCII numbers I have shown you are 8 bits. However, you will sometimes see ASCII codes represented as 7 bits, missing the initial "0." That's because ASCII is an older system, which used a different kind of Byte, a 7-bit Byte. Today, Bytes are 8-bits. An 8-bit Byte is also called an octet.

Keep in mind that ASCII is what is called a character set or character encoding.

A Mess of Text

One problem with computers is that there are dozens of different systems to translate text to binary code! ASCII is usually recognized as a historical base; Windows and Mac generally use the same ASCII codes for basic letters, numbers and symbols used on keyboards—but not exactly the same.

It gets worse: Mac and Windows use completely different codes for the non-ASCII characters. Mac OS X uses Mac OS Roman encoding, and Windows uses Windows-1252 encoding. More modern character encoding systems are even more complex, and there are so many variations that it is difficult to understand them!

However, there is hope: UTF-8 is a popular character encoding system widely used today. It is a system based on Unicode, a code which can represent almost any language. Any character, any symbol, any emoji can be expressed with Unicode, and with UTF-8. It is even compatible with ASCII.

Tech in Your Life

Have you ever seen a browser page that looks like this:

A Mojibake screen

When you see that, you have to go to the "View" menu of your browser, and set the correct text (character) encoding. If you do it right, the page will clear up:

A Mojibake screen cleared up to normal text

What Is That?

In Japan, it is called "mojibake"; there is no common English term. Mojibake happens when the wrong character set is used to display a web page. For example, the page shown above uses the "Shift-JIS" encoding system, but is displayed as mojibake when viewed with the Mac OS Roman character set.

There is another reason as well. Take a look at one small example of text from the two versions of the page shown above:

Clean and Mojibake text

That is the exact same text, first clearly, then as mojibake. Notice that the Japanese version is 3 characters: ホーム; then notice that the mojibake version is 6 characters: ÉzÅ[ÉÄ.

That is not a coincidence. Remember, western encoding systems use 1 Byte, with 256 combinations, to encode text characters. Japanese, like other Asian languages, have far more than 256 characters! Joyo kanji alone has 2136 characters. 1 Byte is not enough.

As a result, Japanese must be encoded with more than one Byte. The Shift-JIS is a double-byte character system, meaning that 2 Bytes are used for each character. 2 Bytes is 16 bits; 16 bits has 65,536 combinations, enough for all kanji and a lot more.

As it happens, the character for ホ has the Shift-JIS code 1000001101111010.

However, when you switch to Mac OS Roman, your computer is looking for 1-Byte characters, so it splits 1000001101111010 into 10000011 and 01111010. Those two characters are—you guessed it— "É" and "z."

B or b?

Now you know what a bit is, what a Byte is, and where they come from. Next, let's look at how they are used.

First, how they are written: bits are written as b (a small "b"), while Bytes are written as B (a capital "B").

Normally, bits are used to describe the speed of data transmission. For example, if you go to an ISP (Internet Service Provider) and get a connection to the Internet, you may ask, "How fast is it?" The ISP will answer you in bits per second, or bps. A common fiber-optic connection, for example, may be 100 Mbps, or 100 million bits per second.

Many people may mistake bps for Bps, but the two are very different. If you truly have a download speed of 100 million bits per second, that means you are getting 12.5 million Bytes per second—only 1/8th the speed you might think!

On the other hand, Bytes are used to describe an amount of data. For example, you might have a photograph which is 2 MB, or 2 million Bytes.

In everyday life, we almost always use Bytes. In the rare cases where we see "bits" used, we must translate. 1 Byte is 8 bits; 1 bit is 1/8th of a Byte.

Going Metric

Next, there are the prefixes used for describing large numbers. We do not usually say "a million Bytes"; instead, we say "megabyte," and we spell it "MB." Here are the different prefixes:

Prefix Term Abbreviation Metric Bytes
  Byte B 1
Kilo Kilobyte KB 1,000
Mega Megabyte MB 1,000,000
Giga Gigabyte GB 1,000,000,000
Tera Terabyte TB 1,000,000,000,000
Peta Petabyte PB 1,000,000,000,000,000
Exa Exabyte EB 1,000,000,000,000,000,000
Zetta Zettabyte ZB 1,000,000,000,000,000,000,000
Yotta Yottabyte YB 1,000,000,000,000,000,000,000,000

Generally, people do not know what these terms are until they start being used in personal computers. The first few, kilo and mega, had been known for a long time because there were used commonly—for example, a kilometer, or a megaton.

However, giga did not really become well-known until computer storage was big enough to hold a gigabyte, which was in the mid- to late-1990's.

These terms are often not used accurately, however; both are used simply to suggest something big. For example, these are two pizzas offered by Japanese pizza chains:

Pizza-la's "Mega Meat" pizza had nothing to do with "mega": there was not a "million" of anything on the pizza! Similarly, Domino's "Giga Meat" pizza neither had a billion pieces of meat, nor did it have 1,000 times the meat that the Mega Meat pizza had.

However, the use of both terms in the media and society in general continues.

More recently, after terabyte hard drives came out in the last decade, people started to hear the prefix tera, so that became widely known. However, it has not yet entered fully, as "tera" is not commonly used with English words to create the meaning of something extra-large.

Before the 1990's, when "giga" entered into common usage, people did not know what "giga" meant, and sometimes pronounced it as "jiga" ("jiga" is an acceptable pronunciation, but is rarely used today). For example, in the 1985 movie Back to the Future, Doc Brown needed to produce 1.21 gigawatts of electricity; Marty McFly, meanwhile, had no idea what that meant:


You might be wondering, where do these prefixes come from?

The longer-used prefixes, mega, giga, and tera, all come from Greek. "Mega" means "great" in Greek; "giga" means "giant," and "tera" means "monster."

The other prefixes are less poetic. "Peta" (penta) is from the greek word for "five," and "exa" is Greek for for "six." "Zetta" comes from Italian, meaning "seven," and "yotta" is also Italian, menaing "eight."


How Much Does a Byte Weigh?

Now you know what the words are. But do you understand what they mean? For example, how many songs fit in a gigabyte? If you want to store 30 minutes of video recorded on your cell phone, will a 4 GB USB flash unit be enough?

The answer is not completely easy, because not every book, photo, song, or movie is the same size. However, here is a rough estimate:

Item Size Notes
Essay 15 KB This might be a 1,500-word essay saved in .docx format.
Book 1 MB The book would be plain text (no formatting, no images) and would be about the same as a 500-page paperback.
Photo 3 MB Assuming an 8-megapixel image taken with an iPhone 5 and saved as a compressed JPG file.
Song 4.5 MB This would be a 3-minute song saved in MP3 format at medium-high quality.
Personal Video 250 MB Assuming a 2-minute video taken at Full HD resolution.
Movie 1.5 GB Assuming a 120-minute movie at Full HD with strong H.246 compression

From this chart, you can perhaps get a better idea of what the terms and amounts mean. For example, you could conclude that a 4 GB USB flash drive is just enough to hold half an hour of iPhone video. But it could also hold almost 900 songs, more than 1300 photos, about 4000 books, or millions of Essays!

Terms to Know

numbera value made up of one or more digits
digita single glyph or character used in a number
binarya 2-digit counting system
octalan 8-digit counting system
decimala 10-digit counting system
hexadecimala 16-digit counting system
bita 0 or a 1; usually used to describe the speed of data transmission.
bpsbits per second.
Byte8 bits; usually used to describe amounts of data.
octetan 8-bit Byte.
basethe number which is the foundation of a counting system.
character encodinga system in which characters are represented by codes, such as binary numbers.
ASCIIone of the early character encoding systems; most modern character encodings include ASCII for the first 128 characters.
Unicodea system with more than 110,000 characters over 100 writing systems.
UTF-8the most popular current character encoding; it includes ASCII, and displays all Unicode characters.
mojibakethe "nonsense" characters that appear when you view characters using the wrong encoding system.
kilo1,000 (a thousand).
mega1,000,000 (a million).
giga1,000,000,000 (a billion).
tera1,000,000,000,000 (a trillion).
peta1,000,000,000,000,000 (a quadrillion).

Previous Chapter Chapter Quiz Next Chapter