### 4.1. Cryptography

Cryptography is a mathematical science used to secure storage and transmission of data. The process involves two steps: encryption transforms information into unreadable data, and decryption converts unreadable data back into a readable form. When cryptography was first used, confidentiality was achieved by keeping the transformation algorithms secret, but people figured out those algorithms. Today, algorithms are kept public and well documented, but they require a secret piece of information; a key, to hide and reveal data. Here are three terms you need to know:

Cleartext

Data in the original form; also referred to as plaintext

Cipher

The algorithm used to protect data

Ciphertext

Data in the encoded (unreadable) form

Cryptography aims to achieve four goals:

Confidentiality

Protect data from falling into the wrong hands

Authentication

Confirm identities of parties involved in communication

Integrity

Allow recipient to verify information was not modified while in transit

Nonrepudiation

Prevent sender from claiming information was never sent

The point of cryptography is to make it easy to hide (encrypt) information yet make it difficult and time consuming for anyone without the decryption key to decrypt encrypted information.

No one technique or algorithm can be used to achieve all the goals listed above. Instead, several concepts and techniques have to be combined to achieve the full effect. There are four important concepts to cover:

• Symmetric encryption

• Asymmetric encryption

• One-way encryption

• Digital certificates

Do not be intimidated by the large number of encryption methods in use. Mathematicians are always looking for better and faster methods, making the number constantly grow. You certainly do not need to be aware of the inner details of these algorithms to use them. You do, however, have to be aware of legal issues that accompany them:

• Cryptology is a science that can be used by anyone who wishes to protect his privacy, but it is of special importance to the military, governments, law enforcement agencies, and criminals. Consequently, many countries have laws that limit the extent to which encryption techniques can be used. For example, until recently, U.S. companies could not export symmetric encryption technology supporting keys larger than 40 bits.

• Some algorithms are patented and cannot be used without a proper license. Libraries implementing patented algorithms are available for free download (often in source code), but you need a license for their legal use.

#### 4.1.1. Symmetric Encryption

Symmetric encryption (also known as private-key encryption or secret-key encryption) is a fast encryption method that uses a single key to encrypt and decrypt data. On its own it offers data confidentiality (and to some extent, authentication) provided the parties involved in communication safely exchange the secret key in advance. An example of the use of symmetric encryption is shown in Figure 4-1.

##### Figure 4-1. Symmetric encryption example

Here are six commonly used symmetric encryption algorithms:

Data Encryption Standard (DES)

Uses a fixed length key of 56 bits. It used to be a U.S. government standard but it is now considered obsolete.

Triple-DES (3DES)

Uses a fixed-length key of 168 bits (112 effective). It was designed to give extended life to DES. Still considered secure.

Blowfish

Uses a variable length key of up to 448 bits. Fast and free.

International Data Encryption Algorithm (IDEA)

Uses a fixed-length key of 128 bits. IDEA is fast, patented, and free for noncommercial use.

RC4

Keys can be anywhere from 1 to 2,048 bits long. (40-bit and 128-bit key lengths are commonly used.) RC4 is very fast and in widespread use. The legal status of RC4 is unclear: it is not free but its unlicensed use appears to be tolerated.

Keys can be 128, 192, or 256 bits long. AES was chosen by the U.S. government to replace DES and 3DES.

A best encryption algorithm does not exist. All algorithms from the list have been thoroughly researched and are considered to be technically secure. Other issues that need to be taken into consideration are the interoperability, key length, speed, and legal issues. The key-length argument renders DES and 3DES (for new implementations) obsolete. It is widely believed that the minimum secure key length for symmetric encryption today is 80 bits. Encryption of at least 128 bits is recommended for all new applications. Having been adopted as a standard by the U.S. government, AES is the closest to being the algorithm of choice.

Symmetric encryption has inherent problems that show up as soon as the number of parties involved is increased to more than two:

• The secret key must be shared between parties in communication. All members of a single communication channel must share the same key. The more people join a group, the more vulnerable the group becomes to a key compromise. Someone may give it away, and no one could detect who did it.

• The approach is not scalable because a different secret key is required for every two people (or communication groups) to communicate securely. Ten people need 45 (9 + 8 + . . . + 1) keys for each one of them to be able to communicate with everyone else securely. A thousand people would need 499,550 keys!

• Symmetric encryption cannot be used on unattended systems to secure data. Because the process can be reversed using the same key, a compromise of such a system leads to the compromise of all data stored in the system.

In spite of these problems, a major advantage to symmetric encryption is its speed, which makes it the only choice when large amounts of data need to be encrypted (for storage or transmission).

#### 4.1.2. Asymmetric Encryption

Asymmetric encryption (also known as public-key encryption) tries to solve the problems found in symmetric encryption algorithms. Instead of one secret key, public-key encryption requires two keys, one of which is called a public key and the other a private key. The two keys, the encryption algorithm, and the decryption algorithm are mathematically related: information encrypted with a public key can be decrypted (using the same algorithm) only if the private key is known. The reverse also holds: data encrypted using the private key can be decrypted only with the public key.

The key names give away their intended usage. The public key can be distributed freely to everyone. Whoever is in the possession of the public key can use the key and the corresponding encryption algorithm to encrypt a message that can only be decrypted by the owner of the private key that corresponds to the public key. This is illustrated in Figure 4-2, in which Bob encrypts a message using Alice's public key and sends the result to Alice. (The names Alice and Bob are commonly used in explanations related to cryptography. For more information, read the corresponding Wikipedia entry: http://en.wikipedia.org/wiki/Alice_and_Bob.) Alice then decrypts the message using her private key.

##### Figure 4-2. Asymmetric encryption example

There exists another use for the private key. When information is encrypted with a private key, anyone (anyone with access to the public key, that is) can decrypt it with the public key. This is not as useless as it may seem at first glance. Because no key other than the public key can unlock the message, the recipient is certain the encrypted message was sent from the private-key owner. This technique of encrypting with a private key, illustrated in Figure 4-3, is known as a digital signature because it is the equivalent of a real signature in everyday life.

##### Figure 4-3. Alice sends Bob a message he can verify came from her

Here are three asymmetric encryption methods in use today:

A well-known and widely used public-key cryptography system. Developed in 1978.

Digital Signature Algorithm (DSA)

A U.S. government standard used for digital signatures since 1991.

Elliptic curve

A mathematically different approach to public-key encryption that is thought to offer higher security levels.

Public-key encryption does have a significant drawback: it is much slower than symmetric encryption, so even today's computers cannot use this type of encryption alone and achieve acceptably fast communication speeds. Because of this, it is mostly used to digitally sign small amounts of data.

Public-key cryptography seems to solve the scalability problem we mentioned earlier. If every person has a two-key pair, anyone on the Internet will be able to communicate securely with anyone else. One problem remains, which is the problem of key distribution. How do you find someone's public key? And how do you know the key you have really belongs to them? I will address these issues in a moment.

#### 4.1.3. One-Way Encryption

One-way encryption is the process performed by certain mathematical functions that generate "random" output when given some data on input. These functions are called hash functions or message digest functions. The word hash is used to refer to the output produced by a hash function. Hash functions have the following attributes:

• The size of the output they produce is much smaller than the size of the input. In fact, the size of the output is fixed.

• The output is always identical when the inputs are identical.

• The output seems random (i.e., a small variation of the input data results in a large variation in the output).

• It is not possible to reconstruct the input, given the output (hence the term one-way).

Hash functions have two common uses. One is to store some information without storing the data itself. For example, hash functions are frequently used for safe password storage. Instead of storing passwords in plaintextwhere they can be accessed by whoever has access to the systemit is better to store only password hashes. Since the same password always produces the same hash, the system can still perform its main functionpassword verificationbut the risk of user password database compromise is gone.

The other common use is to quickly verify data integrity. (You may have done this, as shown in Chapter 2, when you verified the integrity of the downloaded Apache distribution.) If a hash output is provided for a file, the recipient can calculate the hash himself and compare the result with the provided value. A difference in values means the file was changed or corrupted.

Hash functions are free of usage, export, or patent restrictions, and that led to their popularity and unrestricted usage growth.

Here are three popular hash functions:

Message Digest algorithm 5 (MD5)

Produces 128-bit output from input of any length. Released as RFC 1321 in 1992. In wide use.

Secure Hash Algorithm 1 (SHA-1)

Designed as an improvement to MD5 and produces 160-bit output for input of any length. A U.S. government standard.

SHA-256, SHA-384, and SHA-512

Longer-output variants of the popular SHA-1.

Today, it is believed a hash function should produce output at least 160 bits long. Therefore, the SHA-1 algorithm is recommended as the hash algorithm of choice for new applications.

#### 4.1.4. Public-Key Infrastructure

Encryption algorithms alone are insufficient to verify someone's identity in the digital world. This is especially true if you need to verify the identity of someone you have never met. Public-key infrastructure (PKI) is a concept that allows identities to be bound to certificates and provides a way to verify that certificates are genuine. It uses public-key encryption, digital certificates, and certificate authorities to do this.

##### 4.1.4.1 Digital certificates

A digital certificate is an electronic document used to identify an organization, an individual, or a computer system. It is similar to documents issued by governments, which are designed to prove one thing or the other (such as your identity, or the fact that you have passed a driving test). Unlike hardcopy documents, however, digital certificates can have an additional function: they can be used to sign other digital certificates.

Each certificate contains information about a subject (the person or organization whose identity is being certified), as well as the subject's public key and a digital signature made by the authority issuing the certificate. There are many standards developed for digital certificates, but X.509 v3 is almost universally used (the popular PGP encryption protocol being the only exception).

A digital certificate is your ID in the digital world. Unlike the real world, no organization has exclusive rights to issue "official" certificates at this time (although governments will probably start issuing digital certificates in the future). Anyone with enough skill can create and sign digital certificates. But if everyone did, digital certificates would not be worth much. It is like me vouching for someone I know. Sure, my mother is probably going to trust me, but will someone who does not know me at all? For certificates to have value they must be trusted. You will see how this can be achieved in the next section.

##### 4.1.4.2 Certificate authorities

A certificate authority (CA) is an entity that signs certificates. If you trust a CA then you will probably trust the certificate it signed, too. Anyone can be a CA, and you can even sign your own certificate (we will do exactly that later). There are three kinds of certificates:

Self-signed certificates

In this case, the owner of the certificate acts as his own CA, signing the certificate himself. These certificates are mostly useless since they cannot be used to verify someone's identity. In some instances, they can be useful, however, as you will see later when we discuss SSL.

Certificates signed by a private CA

It is often feasible for an organization to be its own CA when certificates are used only for internal purposes among a limited circle of users. This is similar to employee passes that are widely in use today.

Certificates signed by a public CA

When trust needs to exist between a large, loosely connected population, an independent authority must be used. It is a compromise: you agree to trust an organization that acts as a CA, and it pledges to verify the identities of all entities it signs certificates for. Some well-known certificate authorities are Equifax, RSA, Thawte, and VeriSign.

I have mentioned that digital certificates can be used to sign other digital certificates. This is what CAs do. They have one very important certificate, called the root certificate, which they use to sign other people's certificates. CAs sign their own root certificates and certificates from trusted authorities are accepted as valid. Such certificates are distributed with software that uses them (e.g., web browsers). A partial list of authorities accepted by my browser, Mozilla 1.7, is given in Figure 4-4. (I added the Apache Security CA, whose creation is shown later in this chapter, after importing into the browser the root certificate for it.)

##### 4.1.4.3 Web of trust

Identity validation through certificate authorities represents a well-organized identity verification model. A small number of trusted certificate authorities have the last word in saying who is legitimate. Another approach to identity verification is to avoid the use of authorities, and base verification on a distributed, peer-to-peer operation where users' identities are confirmed by other users. This is how a web of trust is formed. It is a method commonly used among security-conscious computer users today.

This is how the web of trust works:

• Each user creates a public-/private-key pair and distributes the public key widely.

• When two certificate owners meet, they use their real-life IDs to verify their identities, and then they cross-sign each other's digital certificates.

• When enough people do this, then for every two people who wish to communicate, there will be a chain of signatures marking the path between them.

A web of trust example is given in Figure 4-5.

##### Figure 4-5. There are two trust paths from Alice to Dave

The web of trust is difficult but not impossible to achieve. As long as every person in the chain ensures the next person is who he claims he is, and as long as every member remains vigilant, there is a good chance of success. However, misuse is possible and likely. That is why the user of the web of trust must decide what trust means in each case. Having one path from one person to another is good, but having multiple independent paths is better.

The web of trust concept is well suited for use by individuals and by programs like PGP (Pretty Good Privacy) or GnuPG. You can find out more about the web of trust concept in the GnuPG documentation:

#### 4.1.5. How It All Falls into Place

Now that we have the basic elements covered, let's examine how these pieces fall into place:

• If you encode some cleartext using a public key (from a certificate) and the user you are communicating with sends the cleartext version back, you know that user possesses the private key. (Here, the cleartext you encode is referred to as a challenge. That term is used to refer to something sent to another party challenging the other party to prove something. In this case, the other party is challenged to prove it possesses the corresponding private key by using it to decode what you sent.)

• If a certificate contains a digital signature of a CA you trust, you can be reasonably sure the certificate was issued to the individual whose name appears in the certificate.

• To communicate securely with someone with whom you have established a secret key in advance, you use private-key encryption.

• To communicate securely with someone, without having established a secret key in advance, you start communicating using public-key encryption (which is slow), agree on a secret key, and then continue communication using private-key encryption (which is fast).

• If you only want to ensure communication was not tampered with, you use one-way encryption (which is very fast) to calculate a hash for every block of data sent, and then digitally sign just the hash. Digital signatures are slow, but the performance will be acceptable since only a small fraction of data is being signed.

If you want to continue studying cryptography, read Applied Cryptography by Bruce Schneier (Wiley), considered to be a major work in the field.