Session 30: Filesystem efficiency and security

Performance (Section 5.3.5)
  blocks
  cache
Security introduction (not in text)
  types of attacks
  types of cryptography

Security introduction

Types of attacks

In security, there are several types of attacks that you can worry about.

compromising data confidentiality: A person might be able to acquire secrets from the system.
compromising data integrity: A person might be able to destroy critical information in the system. Note that this isn't necessarily more difficult than compromising confidentiality; for example, a virus might break into the system and destroy random files on the hard drive. Data integrity has been lost, but no confidentiality has not.
denial of service: A person might prevent others from accessing the system. For example, on a network, somebody might flood a server with requests, loading it so heavily that others can't use the server effectively. Although the person hasn't really accessed or destroyed any information, they've harmed me by preventing my access to it.

We'll be mostly concerned with the first type of attack for now, since it's the most basic. We'll investigate two types of ways of protecting data confidentiality: cryptography and passwords. After this, we'll investigate viruses, which tend to do more on the side of compromising data integrity or denying service (though they can also be useful in compromising confidentiality).

Types of cryptography

One of the primary ways to prevent people from compromising data confidentiality is to employ cryptography. Cryptography turns out to be the solution to many other problems, too. Now we're going to look at a sample of several types of cryptography to get an idea of what sorts of problems cryptography might tackle.

These types of cryptography are ordered roughly from simplest to hardest.

Steganography

Steganography has been getting a lot of press lately, due to the suspicion that terrorists may be using it to communicate among themselves. In fact, this led to the U S government recently requesting news agencies to exercise more discretion when broadcasting videos that could come from terrorist agencies. (Naturally, this request led to a small freedom-of-speech contrtoversy.)

Steganography employs hiding a message within the context of a seemingly innocuous message. For example, consider a 1024x768 24-bit picture. It's a fact that people can't really between all 2²⁴ colors - the color (123,23,12) (representing a red value of 123, a green value of 23, and a blue value of 12) is hardly distinguishible from the color (124,22,11). It certainly isn't distinguishible when it's a single pixel in the context of a larger picture.

Say I have a 200KB message that I want to hide. I can take the first bit of the message and replace the low-order bit of the red byte of the first pixel in the image. Then the second bit of the message can go into the low-order bit of the green byte of the first pixel. The third bit goes into the blue byte. Then the fourth byte goes into the red byte of the second pixel. And so on. The picture's size is 1024x758x3=2274KB, and I can hide a bit into each byte, so that I can hide a total of 2274KB/8 = 284KB of text into the picture.

The normal listener won't be able to detect that the message is being transmitted at all. Of course, a suspecting listener could easily retrieve the message. But, if the message is encrypted (using techniques listed later), the listener would have a hard time distinguishing the message from random noise.

In practice, steganography is complicated by the fact that the standards for transmitting a picture do not involve a perfect transfer of information - on the Web, for example, the image might be compressed using a lossy technique (as in a JPEG). But working around this isn't a tremendous problem.

In general, steganography doesn't have to use pictures, of course. Any large message will do - like an audio recording. Or carefully constructed text might encode a message. For example, you might communicate a number as a sequence of digits, where each digit is encoded as a sentence with that many words in it. This paragraph encodes the number 09866.

Hashing / one-way functions

Cryptography hashing is commonly used for storing passwords. For example, Unix systems will never actually store a password on a disk. Instead, it takes the password and hashes it, storing the hashed value on the disk instead. That way, even if somebody can read the password file (and, in Unix, the password file is generally publicly accessible in /etc/passwd, so that's not a big deal), they can't actually get a password.

It's important that the hashing function is a one-way function - that is, a function that cannot be easily inverted. Convincing examples of one-way functions aren't easy to come up with, but an example of a function that is easier to evaluate than to invert is h(x) = x². It's easy to square a number, but it's relatively more difficult to take a number and compute its square root. Of course, computing a square root is still easy for a computer, so the OS uses a different function.

Private-key cryptography

Private-key cryptography is for encoding a message to somebody else so that an eavesdropping person has no chance of understanding it. (This is different from steganography, where the communicator tries to hide the message, so that the eavesdropper has no way of detecting it. Steganography does not attempt to prevent the eavesdropper from decoding the message, if they know of its presence.)

In private-key cryptography, the two communicators get together into a room and agree on a key. Then, on the field, when they have a message, they can communicate using their previously-agreed-upon key. An eavesdropper, ignorant of the key, shouldn't understand what they say.

For example, say I'm a CIA manager, and I have a CIA agent in the USSR. Natasha, a KGB agent, is trying to listen in. Before I send my CIA agent abroad, we get into a room and agree on a string of random numbers, and the CIA agent packs off to Russia carrying this book of random numbers. Once there, whenever the agent has a message to send, he can encode the message by XORing (exclusive or) it with successive random numbers from the book. For example, if he wants to say, ``00111101,'' then he looks in his book and notices that the next unused random numbers are ``11110010,'' and so he telephones back, saying, ``11001111.'' Natasha, who has tapped the line, is clueless as to my original message, since each bit of the announced message is completely random from her eyes. (Say the original bit is 0. There's a 50% chance the announced bit is 0, since there's a 50% chance we chose 0 as our random bit. And there's a 50% chance the announced bit is 1. Similarly if the original bit is 1. So Natasha is just hearing random noise.)

This is the one-time pad, and I have heard that the CIA has actually used this technique. (Of course, reliable information on how the CIA does its job is impossible to come by; all I have is vague rumors of what they do.) In practice, it's problematic for two reasons.

It requires the two communicators to agree on a key in private, and that key must remain secure thereafter. This problem is endemic to private-key cryptography.
The key in a one-time pad restricts the length of any messages sent. If you begin reusing the pad, then you begin losing security, since Natasha might be able to pick up on patterns in the encrypted text and infer things about the pad from that.

The second can be worked around, if you have a way of generating predictable random numbers. A pseudorandom number generator works well: You agree on a pseudorandom number seed in private, and then both parties work with the random numbers generated by that seed. That way, you both get an infinite stream of pseudorandom bits. This involves some loss of security, though, since Natasha might possible be able to crack the pseudorandom number key via detection of patterns in the announced bits. It doesn't sound easy - and it certainly isn't, if you choose a good pseudorandom number generator (preferably with proven security) - but there's not the same sort of guarantee you get with the completely random one-time pad scheme.

Public-key cryptography

Public-key cryptography is a solution to the first of the two problems listed above: It removes the requirement that the two parties must agree in private in advance.

Here's how it works. While my CIA spy is abroad, I think up two keys: a private key and a public key, according to an public-key cryptography algorithm. I announce the public key to everybody, and keeps the private key to myself. These keys define two functions, f_priv and f_pub It turns out that f_pub is very hard to invert, but I chooses the private key and public key together, so that I knows f_priv but nobody else can.

Now, when my spy wants to send a message M, he sends me f_pub(M). I receive the encrypted message, apply f_priv to it, giving me f_priv(f_pub(M)) = M. Natasha will overhear my spy's message, but that's scant help to her, since she doesn't know f_priv and so she can't invert it.

It's an amazing thing. How could you ever think it would work? But it does. In fact, when you send encrypted information via the Web (as you should do whenever you're sending data like credit card numbers), you're using public-key cryptography. After all, it's not as if you and the merchant got together and agreed on a key in advance.

The most widely known public-key cryptosystems frequently use prime numbers. In this case, the private key would contain two very large prime numbers (each with about 512 or more bits), and the public key would be the product of these two primes. To generate the private and public keys, the person just has two find two large prime numbers (which can be done relatively quickly) and multiply them. But finding the prime factorization of a number takes a relatively long time. Though it's possible to do it, it takes so long as to be impractical.

Digital signatures

Another variant of cryptography arises in the following problem: How can I send a message so that the recipient can be sure it's me? For example, it would be nice if, when I e-mail my bank asking them to transfer money between accounts, they could actually verify that it's me talking, and not somebody who's just pretending to be me.

Actually, one of the most popular solutions of this turns out to be a variant of public-key crytography. I'll choose a public key and a private key, and I publicly announce my public key as being the way you can verify that a message is from me. Now, whenever I have a message M to send, I'll actually send f_priv(M). The bank, when it receives such a message, can decrypt it using f_pub, since f_pub(f_priv(M)) = M. Moreover, it can be confident that the message is from me, since it's not practical for somebody else to discover f_priv on their own.

(Actually, I announced an insecure message, though it's unlikely I really want to broadcast my transaction to any eavesdroppers. I'd actually encrypt my signed message using the bank's public key, so that only the bank could decrypt it. And then of course they'd have to decrypt it again in order to verify that my signature was on the message.)