Monday, September 12, 2005

Cryptography 1 - substitution ciphers

The title contains a link to a very interesting site regarding cryptographic methods. The site is given as a tutorial link towards level 13 at OSIX.

As probably some of you know, I have completed all of the challenges at Osix. Right now I am trying to solve the bonus challenges and after that I'll go to reversing challenges. You probably remember that I promised last time to post something on cryptography, so here it is.

First of all, let us talk about cryptography and cryptanalysis. Although I heard these terms used interchangeably, they are actually quite different. Kryptos in greek, means secret. The -graphy part in cryptography means "to write". The -analysis suffix means to solve. So the two sciences are basically opposites. While one is concerned with hiding informations, the other is concerned with finding hidden information.

Now that we've got over this, let us also explain, for sake of completness, the terms cipher and code. The two terms are closely related but they are considered to be different by the cryptographist.

A cipher is a way to modify the text (in order to hide it). The modality to modify the text is based on changing units like letters or certain fixed blocks of letters in an algorithmic fashion. We shall give an example of a cipher. The following cipher is generally called atbash and comes from the ancient hebrews. This cipher also appears in the recent best-seller The Da Vinci Code.

The atbash is a simple cipher. To use it, you simply replace A with Z, B with Y, C with X, and so on. You can write a simple substitution table:

A B C D E F G H I J K L M N O P Q S R T U V W X Y Z
Z Y X W V U T S R Q P O N M L K J I H G F E D C B A

Of course, we can shorten this table a little bit and write it like this:
A B C D E F G H I J K L M
Z Y X W V U T S R Q P O N

As you can see, the logical processing units are letters.

A code is also a way to modify text (in order to hide it or for other purposes). Unlike a cipher, a code operates on phrases, words or similar blocks that generally have a semantic unity.

A simple example of a code is the so called 1337speek. I will not detail it here, as I believe most of you are familiar with it.

The word code is also used in other senses as well. An example would be "error-correcting code". However, for the time being we will use the definitions above.

In the next part we shall discuss different coding methods that enjoyed some popularity at some moment or other in time, as well as different ways of breaking those ciphers.

Probably the most useful cipher to know, when surfing the net, is the so called Caesar cipher. This one is a substitution cipher, not unlike the atbash that we met above. Here, we consider that the alphabet is written on a disk. To find the code of a letter we simply look on the disk for the letter that is forward one position on the disk. In this way we substitute A with B, B with C and so forth. Z is substituted with A.

It is called Caesar cipher because it was used by Caesar the first time.

A first generalization that we could make of this code is to move with more than one letter on a disk. For example we could move with two letters, and we would substitute A with C, B with D, and so forth. We could call this cipher Caesar-2 (and the first one is Caesar-1).We could have up to Caesar-25.

The reason that I said that this is one of the most useful ciphers to know is because Caesar-13 is very used on the web. It is used by people to give answers for logic puzzles on the same page as the puzzle. This is similar to the usual practice of some editors to write the answer of a puzzle on the same page as the problem, but upside down.

Both atbash and Caesar are quite easy to crack. To crack the you usually use something called frequency analysis. To decrypt the text, you find out the most used letter in your text. Then, you compare this with the most used letters in the language in which the message is written.

For example, in English the most used letters are (in this order): E, T, N, O, A, I. In Romanian, the most used letters are: E, I, A, R, T, N, L. If you see a letter repeated over a long period, it is probably certain that the letter in cause is the most frequent letter in your language. Once this guess is made, you look at the shortest words in your message (assuming the message has words clearly delimited). These words will help you deduce some of the other letters. In this way, messages get decrypted quite easily.

For now, besides the link in the title, I will give you this link of Lanaki's classical cryptography course.

Until next time....

0 Comments:

Post a Comment

<< Home