Beautiful, for some reason or another, since yesterday (when I published my tutorial), my number of page-loads/day has doubled. So it would seem that this crappy style indeed gave some results. Therefore, I shall return to my cryptographic articles. There were
a lot of replies to the previous articles (for
a lot of in the range 0 to 1).
We have seen in the previous article that there is a simple modality of cracking simple substitution ciphers, using only the cipher text and without any knowledge of any part of the clear text. The method for cracking it is called frequency analysis and works for any reasonable message (say 20 or so characters). The obvious question comes here: if substitution is so easy to be broken, is there any way to make it harder? Well, the answer is yes, and this article tries to give some better alternatives to the simple substitution.
First of all, though, a tip. Writing a frequency analyzer is fairly simple. However, there is a tool which is already written, in case you would like to use it. It is called
CryptTool. It does some nice things like plotting frequency graphs, cracking substitution ciphers, cracking Vigenere ciphers and other stuff like that. You might give it a try. If you get a similar tool, by all means please let me know.
Back to the meat of the article: How do we make substitution harder? I bet some of you are simply thinking of adding another substitution, or even 3-4 other substitutions. Well, this doesn't really work. The reason for this is that adding more substitutions can be solved as easy as one substitution. For example, if E becomes Z and in the next substitution Z becomes K, the cryptanalyst will easily see that the frequency of all the K's is very large and will therefore infere that K is substitutes E (which circumvents all of our protections). So multiple substitutions are no better than simple substitution.
What else could we do? Suppose we have a text like this one:
This is a cipher text which is encoded with a polyalphabetic substitution.If we use Caesar 1 we get this:
Uijt jt b djqifs ufyu xijdi jt fodpefe xjui b qpmzbmqibcfujd tvctujuvujpo.If we use Caesar 13 (or ROT-13 as it is called on the net) we get this:
Hvwg wg o qwdvsf hslh kvwqv wg sbqcrsr kwhv o dczmozdvopshwq gipghwhihwcb.The idea when using polyalphabetic ciphers is something like this. Let us use different code portions for different parts of the message. The above message could therefore be crypted like this:
AuijtjtbdjqifsufyuxijdijtNsbqcrsrkwhvodczmozdvopshwqgipghwhihwcb.
This encryption is fairly simple. The capital letters are indicators for the type of Caesar used. In the above example, A means we use Caesar 1 and N (the 13th letter) means we use Caesar 13.
Now the above example might be easily decrypted, because it seems pretty obvious that the capital letters look like some separators. So ultimately, the cryptanalyst will probably get the idea to use frequency analysis on each of the parts.
However, our idea is not entirely without a merit since frequency analysis becomes harder, the smaller the pieces. If we have small enough pieces, which are all crypted differently (i.e. no two Caesar-1), frequency analysis becomes very hard.
I believe this is the route of thinking towards safer ciphers, and probably Giovan Batista Belaso thought so too. This guy took the above idea to the extreme. He made all the segments one letter long, and made the key invisible.
Suppose we have this message:
Sing, O goddess, the anger of Achilles son of Peleus, that brought countless ills upon the Achaeans.First we make capitalize everything and remove all the spaces and punctuations:
SINGOGODDESSTHEANGEROFACHILLESSONOFPELEUSTHATBROUGHTCOUNTLESSILLSUPONTHEACHEANS.After this, we think of a key. Suppose the key is EXQUISITE. We write it like this:
EXQUISITEEXQUISITEEXQUISITEEXQUISITEEXQUISITEEXQUISITEEXQUISITEEXQUISITEEXQUISI
SINGOGODDESSTHEANGEROFACHILLESSONOFPELEUSTHATBROUGHTCOUNTLESSILLSUPONTHEACHEANS.Now, we use for each of the letters in the plain-text the Caesar alphabet indicated for the letter above it. This should give us this text:
WFDAWYWWHIPINPWIGKIOEZIUPBPPBIMWFWYTIIUOALPTXFOEOOZBVSYKJFMKABPPPKJWFBAIEZXYIFA.The method of encryption described above is called
Vigenere cipher after the name of a 16th century French cryptographer. As we shall see Vigenere (the cipher) is not by any means unbreakable. Vigenere (the man) actually invented a much stronger encryption method, but thus are the strange workings of history.
For now, I shall tell you that Vigenere is breakable, but I'll leave the breaking method to be guessed by you. I will explain it in the next instalment, but you should think of this opportunity as a nice occasion of practicing what you learned until now. Try to find a method of breaking the Vigenere without using Google to find out exactly how it's done. I promise to you that we'll get to that in the next chapter.
Hint 1: Use large texts (1000-2000 or more)
Hint 2: Use
this site for frequency tables in any language. (note: on the main page of this dude there is a mention of a guy who e-mailed something about Romanian or somewhat. It's not me.) Another approach would be to write your own tool of frequency analysing and to find out the letter frenquecies in your language using a large enough text (the larger, the better).
Now on to polygraphic substitution ciphers. Polygraphic means simply "more drawings(letters)". We use a technique similar to simple substitution but instead of substituting each letter with a symbol letter, we substitute each two(or three, or four,..) letters with a symbol. The easiest symbol is another two letters.
The problem that appears here is the following: 26 letters are easily represented in a table with 26 rows and and 2 columns (or 26 columns and 2 rows). The method is however inadequated for 26*26==676 pairs of letters. A solution would be a table of 26x26. This is still inadequate however, because it is difficult to remember very easy each of the combinations within the table. A truly efficient polygraphic cipher would be random (so each table would have equal probability of substitution with another two letters).
An elegant solution was found to this problem by Charles Wheatstone in 1854. The cipher he developed was called the Playfair cipher. Here we will show a similar idea called the 4-square Playfair cipher which is considered safer and is also easier to understand. The idea was to use 4 tables of 5x5 in size and fill them with letters, in a random order. The we arrange the 4 tables like this:
A | B
---+---
C | D
How do we use them? It is a bit complicated, but I will try to make it as easy as possible. Suppose you have this text:
As he spoke he wept aloud, and his mother heard him where she was sitting in the depths of the sea hard by the old man her father.
- The first thing we do is to remove all the spaces.
- Second, we pad the text with the letter X up to an even number of letters.
- Third, for each pair in the plain text, we do the following:
- search the position of the first letter in table A and the position of the second letter in table D.
- find in table B the letter which is on the same line as the letter found in table A and on the same column as the letter found in table D. This is the first letter of the ciphered pair
- find in table C the letter which is on the same line as the letter found in table D and on the same column as the letter found in table A. This is the second letter of the ciphered pair
The Playfair is a fairly difficult cipher to cryptanalyse, so I will not ask you to do it (although you can if you want to). Here is another challenge: try to find an easy to remember method to generate random alphabets. This method should allow you to easily recreate the cipher text if you were to lose the 4 tables.
The answer to the challenges will be provided in the next instalment of the series, however, if you get to solve them, I would like to hear from you either by e-mail or through comments.
Best wishes to everyone!