Ideas to bypass the frequency analysis

(Article from 2013)

In the history of encryption article, I even dealt with mono alphabetic substitution. This means, that every letter from the source alphabet will be substituted with exactly one letter from the target alphabet. It is simple to understand as it is easy to crack. The mono alphabetic substitution can be cracked with the frequency analysis. This old method was invented by arab scholars. They built a table with the frequency of occurrence for every letter of the source language. After that, they compared this table with the text. With this method, they were able to crack this simple cipher.

In this article, I will present some little tricks to avoid the cracking of this cipher with the frequency analysis. I must admit, that the list of the following methods is not complete. All ideas are just an approach, but I think that they can be very useful for private usage. (In all examples, I will use ROT13 to symbolize an encryption with a mono alphabetic method. Furthermore, the short sentences in this examples are nearly uncrackable with a frequency table, because the deliver not enough input.)

I considered the following techniques:

Vocal reduction
Extended target alphabet
Sectional target alphabets

Vocal reduction

The vocal reduction method can be very strong and hard to crack. But on the other hand, this method isn’t very practicable. The main problem is, that the receiver of the message has to reconstruct the message with a high effort. Vocal reduction means, that the sender deletes all vocals from the source text, before he encrypts the message. The analysis fails, because the frequency of the letters isn’t correct. For english texts, the target alphabet has 5 letters less than the source alphabet. The receiver knows the substitution and can decrypt the text. But, the effort to reconstruct the text with the correct vocals is very high. The only helpers, that the receiver has, are the logical semantics of the source language. With the estimated sense of the message (that makes it an information) the receiver is able to decrypt the text faster.

process

plain source text
deleted vocals source text
encryption
sending
receiving
guess the plain source text

example

plain source text: Hello World – How are you – And what about your sister.

deleted vocals source text: Hll Wrld – Hw r y – nd wht bt yr sstr

encrypted with ROT13: Uyy Jeyq – Uj e l – aq jug og le ffge.

Note that the main disadvantage of this method is the lost of information during the encryption process!

Extended target alphabet

A target alphabet is used to reflect an encrypted source letter. That’s the sore point of the mono alphabetic substitution. One letter of the source alphabet is reflected by one letter of the target alphabet. And that is the problem. The frequency analysis analyses the occurrence of all letters. Then you can try some combinations and you will crack the cipher easily. The extended target alphabet (includes for instance 0,..,9,$,#,+,*, etc.) have more options to reflect a source letter. For instance you can translate a “a” with “t” or “$”. You can’t do this for every letter, but you can do it for a few of the most used letters. So the frequency analysis goes wrong! Indeed, this method isn’t very elegant, but it fulfills it’s purpose.

example

plain source text: Hello World – How are you – And what about your sister.

substitution rules: ROT13 and A <-> N,# and E <-> R,$

encrypted text: Uryyb Jbeyq – Ubj ner$lbh – #aq jung nobhg lbhe fvfg$e.

Sectional target alphabets

Every mono alphabetic substitution has one target alphabet, that represents the encrypted text. Another option is to use more target alphabets. This method is very variable. You can use a new alphabet for every new sentence or a new section in the text. The agreement of which alphabet is used when, must be set before the first text was encrypted. The frequency analysis fails, because the frequency of the occurrence is more unclear then with a simple mono alphabetic substitution. Which letter has the highest occurrence and which one has the lowest is random.

process

plain source text
divide text in sections (words, sentences, paragraphs etc.)
encrypt with various rules for every section
encrypted text

example

The key is 1938271
This means, that the first sentence will encrypted with ROT 1, the second with ROT 9, ROT3 etc.
The receiver knows the key

Please note, that this method is still a mono alphabetic method for every section. For all sections, it looks like a poly alphabetic substitution method – but that’s wrong.

Conclusion

As you can see in the example above, you can do much more with a mono alphabetic encryption method as just translating a source text into a target text with a simple alphabet. But in most cases, you have to provide extra information about the “How to decrypt?” aspect. To bypass the extra information, you can start your text with a couple of letters or numbers, that represents an encrypted key. After decrypting this key with a standard procedure, you have the right key, to fulfill the real decrypt method for the encrypted text. The ideas, introduced above, are really simple and easy to implement. I think that you can do much more tricky things with the mono alphabetic encryption method, but in my mind this ends up in more “security over obscurity” and that misses the aim.

Alexander Bresk – Machine Learning & AI Expert

Ideas to bypass the frequency analysis

Vocal reduction

Extended target alphabet

Sectional target alphabets

Conclusion

Introduction to Random Forests

Finding Multiplier nodes without graph analysis

Hola k’tal – Edit Distance revisited

A Note on Answer Type Detection using German Text

A Note on Answer Type Detection

Question Answering using Unstructured Data

Ipsum – Formula Calculator in PHP

SemMap – The semantic map for German language