San Diego State University logo

Vigenere Ciphers

Polyalphabetic
Ciphers
 

We noted that all substitution ciphers had the following property:

    The key is a 1-1 function from the alphabet onto the alphabet. Such a 1-1 function from a set onto itself is called a permutation.
Any encryption scheme with a key that is a function of single characters in the input alphabet is called a monoalphabetic cipher.
Polyalphabetic
Substitution
Cipher
 

Any substitution cipher in which there is no 1-1 correspondence between plain text characters and ciphertext characters is called polyalphabetic.

Note this does NOT mean we are giving up one the idea that the encryption scheme as a whole be 1-1. We still want it to be the case that every ciphertext message decrypt unambiguously as one plaintext message.

But we do away with 1-1 correspondence between plain and cipher text characters

Vigenere  

How is this possible?

Choose a key. Any sequence of characters. Any length. Let's pick an English word to make it memorable: milk.

Choose a message. Let's pick a portion of the following message:

I am arriving on a plane.  The plane is due
at 5 PM Friday.
Remove spaces.

Procedure.

  1. Encode plain text.
  2. Write key under coded plaintext, repeating as many times as necessary to cover entire plain text. (each group of plain text letters corresponding to one key repetition is called a block).
  3. Encode key.
  4. Add key character code to corresponding plain text code (mod 26)
  5. Decoding of resulting integer is cipher text char

     
    Plain Text o n a p l a n e t h e p l a n e i s d u e
    Encoding 14 13 0 15 11 0 13 4 19 7 4 15 11 0 13 4 8 18 3 20 4
    Key m i l k m i l k m i l k m i l k m i l k m
    K-encoding 12 8 11 10 12 8 11 10 12 8 11 10 12 8 11 10 12 8 11 10 12
    Cipher Code 0 21 11 25 23 8 24 14 5 15 15 25 23 8 24 14 20 0 14 4 16
    Cipher Text a v l z x i y o f p p z x i y o u a o e q

Notice the repeat of p in the cipher text row. These two occurrences of p come from different plain text characters (h and e). So the cipher scheme is clearly not 1-1 at the character level.

But the correspondence between message and cipher text IS 1-1. Given this cipher text and this key, there is only one decoding.

The lack of 1-1-ness at the character level can be illustrated dramatically with the right choice of key and plain text:

     
    Plain Text e d c b a
    Encoding 4 3 2 1 0
    Key a b c d e
    K-encoding 0 1 2 3 4
    Cipher Code 4 4 4 4 4
    Cipher Text e e e e e
Difficulty
of
Vigenere
 

In fact, for any plain text and and any cipher text of the same length, there is a key that connects them.

For example, consider the plain text:

Enter plain text here
and the cipher text:
eeeee eeeee eeee eeee
The key connecting them (ignoring spaces) is:
arlanptewrlahlxan
Verify this.
     
    Plain Text e n t e r p l a i n t e x t h e r e
    Encoding 4 13 19 4 17 15 11 0 8 13 19 4 23 19 7 4 17 4
    Key a r l a n p t e w r l a h l x a n a
    K-encoding 0 17 11 0 13 15 19 4 22 17 11 0 7 11 23 0 13 0
    Cipher Code 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
    Cipher Text e e e e e e e e e e e e e e e e e e

Here's the scarey fact. For any cipher text, there is a key connecting it to any plain text of the same length.

For example, suppose our spies intercept the cipher text:

exxego ex wizir
And suppose we know nothing about the key but we know this is a Vigenere cipher.

Things are not good. The plain text could be:

lets have lunch
In which case the key is:
ttemzojtlomgk
Or it could be:
retreat at dawn
in which case the key is:
ntencolxdfzme
Verify these claims. Similarly for any plain text of the right length. If all keys and plain texts are equally likely, we have no way of knowing which is right.
Relationship
of shift to
Vigenere
 

Consider the previous example again. The plain text could also be:

attack at seven
In which case the key is:
e
Verify this.

Notice that a Vigenere key of length "1" (e) makes perfect sense. The encoding of "e" is 4 so we just keep adding 4 to the plain text encoding.

But this is just a shift-4 cipher! So a shift cipher is a special case of a Vigenere cipher in which the key length is 1.

Vigenere
Square
 

A classic way of representing the encoding of a vigenere cipher is through a Vigenere Square.

This is illustrated at the following web-site.

Cracking
Vigenere
 

The Vigenere cipher used to be state of the art.

Inventor: Blaise de Vigenere b. 1523 [with help from Battista Albert, b. 1404 who had the idea of switching between different cipher alphabets during one encoding.] published the idea in 1586 in Traicte des Chiffres. Idea largely ignored for 200 years, then became popular and was written up quite a bit.

In fact by the 1700s the power of cryptography had come to be appreciated. It had gone professional. There were teams of cryptographers working for governments in black cahmbers such as the Geheime Kabinets Kanzlei (Secret Office ??) which used to decrypt the mail to every embassy in Vienna every morning. In this ere polyalphabetic ciphers became state of the art. The Vigenere cipher became known as Le chiffre indechirrable.

The first systematic approach to decryption is due to Babbage [b. 1791, did this work c. 1854], but that went unpublished, possibly because it was used by British intelligence during the Crimean war. First published attack due to Friedrich Wilhelm Kasiski [Kasiski 1863,Die Geheim Schrifte and die Dechiffrir-Kunst, Secret writing and the art of decryption]

Now in the age of computers such ciphers are quite breakable. How?

The key requirement is that we have a lot of ciphertext relative to the length of the key. Then we get to see repeating patterns.

Consider our milk example again:

     
    Plain Text o n a p l a n e t h e p l a n e i s d u e
    Encoding 14 13 0 15 11 0 13 4 19 7 4 15 11 0 13 4 8 18 3 20 4
    Key m i l k m i l k m i l k m i l k m i l k m
    K-encoding 12 8 11 10 12 8 11 10 12 8 11 10 12 8 11 10 12 8 11 10 12
    Cipher Code 0 21 11 25 23 8 24 14 5 15 15 25 23 8 24 14 20 0 14 4 16
    Cipher Text a v l z x i y o f p p z x i y o u a o e q
Notice both occurrences of the word plane got the same encoding!.

This is because the key is 4 characters long, and the two occurrences of plane are 8 characters apart (an even multiple of 4). So the two occurrences of plane get aligned with milk the same way:

plane
kmilk
zxiyo
These kinds of regularities are the key to cracking a Vigenere code. More on this later.
One-time
Pad
 

I said that having a lot of ciphertext relative to the length of the key was crucial to decoding Vigenere.

What happens when we don't have that?

What happens in the worst case, when the key is LONGER than the plain text?

Or put it from the cryptographer's point of view. When the ALL the ciphertext encrypted by one key is shorter than the key?

This is what is called a one-time pad. Officially we have a one-time pad cipher when:

    A Vigenere key is used for one message and the key is longer than the plain text.
This guarantees that there are no patterns of repetition to begin acquiring information from.

If all keys and all plain texts are equally likely, a onetime pad is the perfect code. There is simply no information available about what the plain text is in the cipher text.

One-time
pad problems
 

Why not always use a onetime pad?

  1. One of two possibilities:
    1. A new key is exchanged each time a message is sent, but then how is key exchange made secure?
    2. A very long key is kept by both sender and receiver (a code book), but code books can be captured.
  2. Some version of option 2 is viable for some circumstances.

Weaknesses. None in theory. Two in practice.

  1. Key not truly random: For example, if it is known that the key is a piece of English text, guessing space is reduced. Brute force reopened as a possibility.

    "e" a bad choice for a key.

    More seriously, there are issues about using computers for random sources. Adam Back on random one-time pad keys (OTPs).

      "It can't be stressed enough how important it is to have a truly random OTP. Just using the random() function provided with C libraries is nowhere near good enough, these typically have a seed of one 32 bit word, so that even if you used the millisecond of your clock as a seed the whole system could be broken with a brute force keysearch of all possible seeds. In cryptographic terms a 32 bit keyspace is tiny, and would take a negligable amount of compute time to break."

      "Basically if you use pseudo-random number generators they are going to be the weak point in the system, unless you have external input like a radio-active decay card, or timings of the milliseconds between keystrokes with proper entropy estimation as used by PGP."

  2. Plain text not random. Known or guessable plain text. But note: Even if the possibilities are reduced to one of two messages of equal length, unless the key is nonrandom, there is no way of choosing between them.
The XOR
trick
 

A very simple example of a one time pad. Extremely easy to code.

We "encode" both plain and key characters. Instead of the code we've been using, we use the standard ASCII codes. Instead of addition, we use XOR. A Kuchling's one time pad in Python

The heart of the encoding in Python, encoding character "a" using pad "k":

 >ord('a')
97     ASCII code
 >ord('k')
107    ASCII code
 >ord('a')^ord('k')
10     XORing 97 and 107
 >chr(ord('a')^ord('k'))
'\n'  back to ASCII character, newline

Explaining XORing:

make_binary(97)
1100001
 >make_binary(107)
1101011

XORing:

1100001 = plain
1101011 = pad
-------
0001010 = cipher = 10
10 is the standard 'ASCII' code for linefeed (or newline): ASCII codes.

Note that decoding is very simple. Just XOR again, same key:

0001010 = cipher
1101011 = pad
-------
1100001 = plain

Example. Plain:

Attack at dawn.

Pad:

Today is a good day to breathe.
Note: Longer than plain text:

Cipher (otp.py plain.txt pad.txt > cipher.txt):

KITADJ

To decode (otp.py cipher.txt pad.txt > decoded.txt). decoded.txt is always identical to plain.txt.

XOR
information
splitting
 

Consider the following problem which will become important when we get to the idea of digital cash.

We want to split a certain piece of information between Alice and Bob and arrange things so that neither of them can learn ANYTHING about it without the help of the other.

Let's say the information is:

Howdy Doody
We can't simply split this in half because anyone who knows Howdy has a good chance of guessing that Doody is the other half. And anyone who knows Doody has a good chance of guessing that Howdy is the other half. And in general kmowing half of any text gives you a pretty good start on guessing the other half.

The way to do this is with XOR-ing:

    Alice x Bob Secret
    1 x 1 0
    1 x 0 1
    0 x 1 1
    0 x 0 0
Notice that if Alice has a 1 that can lead either to a 1 or a 0 in the secret, depending entirely on what Bob has. Notice that if Alice has a 0 that too can lead either to a 1 or a 0 in the secret, depending entirely on what Bob has. Thus knowing one of the binary numbers in Alice or Bob's half of the secret gives no information about the corresponding number in the secret.

So Alice and Bob both have 0 information about the content of the secret (Howdy Doody).

This can be implemented with the same one-time pad code. Let's put

Marilyn Monroe
in 'alice.txt', and the secret in 'howdy.txt'. And do:
otp.py howdy.txt alice.txt > bob.txt
Then 'Marilyn Monroe' is Alice's half of the secret and Bob's half is this unprintable thing. And to get to 'Howdy Doody', they need to get together and do:
otp.py marilyn.txt bob.txt > secret.txt