A decryption workout

Login to the compling lab.

If you are running remotely (ssh via Putty, for example) you will be talking to a commandline. You can just type:

$ python
This will get you to Python, which starts up something like this:
$ python
Python 2.4.1 (#1, May 16 2005, 15:19:29) 
[GCC 4.0.0 20050512 (Red Hat 4.0.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
These 3 chevrons '>>>' mark the line you can now type commands to. Type:
>>> from crypto_class.run_decrypt import *
What has happened is that you have executed these commands (note the lines beginning with '#' are nor executed. They're just there to give you some ideas about what to execute).

For these commands, 'c' has been set to the cipher text we worked on in lecture (lecture on decryption).

c = 'oja gpddju hbkglno cjq tpdo xlrluh 
     uj xlhbwpo pnukv p xkgltpw bjldu lu 
     lh ltbjvupdu uj pyjlx rlyldr p tlhupikd 
     ltbvkhhljd jn ykvo clrc bvkglhljd qckd 
     oja idjq ojav datkvlg vkhawuh pvk jdwo 
     pggavpuk uj p nkq xlrluh'

These commands should look familiar. Except for one, they are all commands discussed in the first

You can now type 'DH' to the Python prompt to inspect the cipher text and hypothesis we worked on in class:

oja gpddju hbkglno cjq tpdo xlrluh uj xlhbwpo 
pnukv p xkgltpw bjldu lu lh ltbjvupdu uj pyjlx 
rlyldr p tlhupikd ltbvkhhljd jn ykvo clrc 
bvkglhljd qckd oja idjq ojav datkvlg vkhawuh 
pvk jdwo pggavpuk uj p nkq xlrluh
But let's work on a new one. Notice c4 is set to a cipher text:
>>> c4
'rba dlqvjblg uj kacqao rldcqo oagcqreapr tbcuqj 
oagcqreapr cpo tlyyaka gaaq qawuad tleeurraa 
eaemaqj cpo jrcii upwlywao up rapnqa qawuad gqltajj'
>>>
So let's create a decoding hypothesis to explore it:
>>> DH = DecodingHypothesis(c4)
Now type DH to look at it.
 >>>DH
cipher = rba dlqvjblg uj kacqao rldcqo oagcqreapr tbcuqj 
         oagcqreapr cpo tlyyaka gaaq qawuad tleeurraa eaemaqj 
         cpo jrcii upwlywao up rapnqa qawuad gqltajj
        
ciph_plain_dict = {}

hypothesis = --- -------- -- ------ ------ ---------- ------ ---------- --- ------- ---- ------ --------- ------- --- ----- -------- -- ------ ------ -------

letter_frequencies = [
('a', 23), ('q', 12), ('r', 10), ('c', 8), ('j', 7), ('l', 7), ('o', 7),
('p', 7), ('u', 7), ('e', 6), ('g', 5), ('d', 4), ('t', 4), ('w', 4),
('b', 3), ('y', 3), ('i', 2), ('k', 2), ('m', 1), ('n', 1), ('v', 1)]

top_digraph_frequencies = [
('cq', 4), ('qa', 4), ('ea', 3), ('ap', 3), ('aa', 2), ('aw', 2), ('ag', 2),
('ao', 2), ('aq', 2), ('gc', 2), ('cp', 2), ('ad', 2), ('pr', 2), ('re', 2),
('tl', 2), ('ra', 2), ('po', 2), ('ly', 2), ('wu', 2), ('ka', 2), ('qr', 2),
('up', 2), ('oa', 2), ('qj', 2), ('ua', 2)]

One Letter Words = []

Two Letter Words = ['uj', 'up']

Three Letter Words = ['rba', 'cpo', 'cpo']
EM = EnglishModels()
Notice 'a' is the most frquent letter. Recalling English letter frequencies:
 >EM
letter_frequencies = [
('e', '0.122'), ('t', '0.091'), ('a', '0.080'), ('o', '0.077'),
('i', '0.075'), ('n', '0.069'), ('s', '0.064'), ('r', '0.061'),
('h', '0.054'), ('l', '0.042'), ('d', '0.039'), ('c', '0.031'),
('u', '0.029'), ('m', '0.024'), ('f', '0.022'), ('g', '0.021'),
('p', '0.020'), ('y', '0.020'), ('w', '0.020'), ('b', '0.016'),
('v', '0.010'), ('k', '0.008'), ('j', '0.002'), ('x', '0.002'),
('q', '0.001'), ('z', '0.001'), ('start', '0.000')]

top_digraph_probs = [
('th', '0.029'), ('he', '0.025'), ('in', '0.020'), ('er', '0.017'),
('an', '0.016'), ('re', '0.014'), ('es', '0.013'), ('on', '0.012'),
('st', '0.012'), ('en', '0.012'), ('nt', '0.011'), ('nd', '0.011'),
('ti', '0.011'), ('to', '0.011'), ('at', '0.010'), ('ea', '0.010'),
('ed', '0.010'), ('or', '0.010'), ('is', '0.010'), ('it', '0.010'),
('ou', '0.009'), ('ng', '0.009'), ('ar', '0.009'), ('ha', '0.009'),
('te', '0.009'), ('et', '0.008'), ('al', '0.008'), ('of', '0.008'),
('as', '0.008'), ('se', '0.007'), ('hi', '0.007'), ('le', '0.007'),
('ro', '0.007'), ('ve', '0.007'), ('sa', '0.006'), ('ri', '0.006'),
('ta', '0.006'), ('me', '0.006'), ('li', '0.006'), ('ra', '0.006'),
('de', '0.006'), ('si', '0.006'), ('ne', '0.006'), ('so', '0.006'),
('el', '0.006'), ('ec', '0.006'), ('ot', '0.005'), ('ic', '0.005'),
('ll', '0.005'), ('be', '0.005')]

One Let Words = [('a', '43178'), ('I', '11493')]

Two Let Words = [
('of', '61398'), ('to', '52892'), ('in', '36547'), ('is', '23072'),
('on', '14294'), ('it', '13776'), ('be', '13410'), ('by', '11063'),
('as', '10231'), ('at', '9655'), ('he', '9204'), ('or', '7404'),
('an', '7215'), ('up', '3585'), ('we', '3451'), ('do', '3261'),
('so', '2887'), ('if', '2837'), ('no', '2794'), ('my', '2436'),
('me', '1900'), ('go', '1231'), ('us', '1145'), ('am', '496'),
('de', '367')]

Three Let Words= [
('the', '118307'), ('and', '51425'), ('for', '17391'), ('was', '15990'),
('are', '9942'), ('not', '9304'), ('his', '8553'), ('you', '7333'),
('had', '7291'), ('has', '6829'), ('but', '6346'), ('who', '4773'),
('one', '4771'), ('can', '4767'), ('all', '4427'), ('its', '4016'),
('her', '3831'), ('she', '3218'), ('out', '2952'), ('two', '2506'),
('him', '2479'), ('new', '2302'), ('any', '2299'), ('may', '2206'),
('now', '2153'), ('did', '1854'), ('way', '1763'), ('our', '1576'),
('too', '1545'), ('how', '1531'), ('own', '1494'), ('see', '1469'),
('get', '1398'), ('off', '1386'), ('end', '1155'), ('man', '1119'),
('use', '1105'), ('day', '1093'), ('say', '1093'), ('old', '985'),
('set', '884'), ('put', '854'), ('got', '779'), ('far', '674'),
('art', '672'), ('men', '657'), ('why', '562'), ('yet', '559'),
('pay', '546'), ('car', '512')]
 >
Let's go ahead and guess 'a'='e':
 >DH.word_guess('a','e')
True
 >DH
cipher = rba dlqvjblg uj kacqao rldcqo oagcqreapr tbcuqj oagcqreapr cpo tlyyaka gaaq qawuad tleeurraa eaemaqj cpo jrcii upwlywao up rapnqa qawuad gqltajj
        
ciph_plain_dict = {'a': 'e'}

hypothesis = --e -------- -- -e--e- ------ -e-----e-- ------ -e-----e-- --- ----e-e -ee- -e--e- -------ee -e--e-- --- ----- ------e- -- -e---e -e--e- ----e--

letter_frequencies = [
('a', 23), ('q', 12), ('r', 10), ('c', 8),
('j', 7), ('l', 7), ('o', 7), ('p', 7),
('u', 7), ('e', 6), ('g', 5), ('d', 4),
('t', 4), ('w', 4), ('b', 3), ('y', 3),
('i', 2), ('k', 2), ('m', 1), ('n', 1),
('v', 1)]

top_digraph_frequencies = [
('cq', 4), ('qa', 4), ('ea', 3), ('ap', 3),
('aa', 2), ('aw', 2), ('ag', 2), ('ao', 2),
('aq', 2), ('gc', 2), ('cp', 2), ('ad', 2),
('pr', 2), ('re', 2), ('tl', 2), ('ra', 2),
('po', 2), ('ly', 2), ('wu', 2), ('ka', 2),
('qr', 2), ('up', 2), ('oa', 2), ('qj', 2), ('ua', 2)]

One Letter Words = []

Two Letter Words = ['uj', 'up']

Three Letter Words = ['rba', 'cpo', 'cpo']
Now let's try to find 't'. There are still lots of candidates. Let's try the next 3 most frequent letters:

 >ass_rt = DH.word_try('r','t')
rba dlqvjblg uj kacqao rldcqo oagcqreapr tbcuqj oagcqreapr cpo tlyyaka gaaq qawuad tleeurraa eaemaqj cpo jrcii upwlywao up rapnqa qawuad gqltajj

--e -------- -- -e--e- ------ -e-----e-- ------ -e-----e-- --- ----e-e -ee- -e--e- -------ee -e--e-- --- ----- ------e- -- -e---e -e--e- ----e--

t-e -------- -- -e--e- t----- -e---t-e-t ------ -e---t-e-t --- ----e-e -ee- -e--ve- -----ttee -e--e-- --- -t--- ------e- -- te---e -e--e- ----e--
 >ass_qt = DH.word_try('q','t')
rba dlqvjblg uj kacqao rldcqo oagcqreapr tbcuqj oagcqreapr cpo tlyyaka gaaq qawuad tleeurraa eaemaqj cpo jrcii upwlywao up rapnqa qawuad gqltajj

--e -------- -- -e--e- ------ -e-----e-- ------ -e-----e-- --- ----e-e -ee- -e--e- -------ee -e--e-- --- ----- ------e- -- -e---e -e--e- ----e--

--e --t----- -- -e-te- ----t- -e--t--e-- ----t- -e--t--e-- --- ----e-e -eet te--e- -------ee -e--et- --- ----- ------e- -- -e--te te--e- -t--e--
 >ass_ct = DH.word_try('c','t')
rba dlqvjblg uj kacqao rldcqo oagcqreapr tbcuqj oagcqreapr cpo tlyyaka gaaq qawuad tleeurraa eaemaqj cpo jrcii upwlywao up rapnqa qawuad gqltajj

--e -------- -- -e--e- ------ -e-----e-- ------ -e-----e-- --- ----e-e -ee- -e--e- -------ee -e--e-- --- ----- ------e- -- -e---e -e--e- ----e--

--e -------- -- -et-e- ---t-- -e-t---e-- --t--- -e-t---e-- t-- ----e-e -ee- -e--e- -------ee -e--e-- t-- --t-- ------e- -- -e---e -e--e- ----e--
These all look fairly reasonable on the face of it. Let's try filling out the word patterns for each guess.

find_all_word_patterns takes a hypothesis (an assignment) and tries to find all words compatible with that guess. It prints out those cipher words that match 20 or fewer plain text words, including those that match 0 words, a very significant event:

 >DH.find_all_pattern_words(ass_qt)
dlqvjblg ['antimony', 'astonish', 'rational', 'saturday', 'watchman']
oagcqreapr []
oagcqreapr []
tlyyaka ['college']
gaaq ['beet', 'feet', 'meet']
qawuad ['teamed', 'teared', 'teased', 'temper', 'tender', 'tensed', 'tenser', 'termed']
tleeurraa []
eaemaqj []
upwlywao ['bivalves', 'confined', 'confiner', 'confines', 'disposer', 'fairview', 'harbored', 'involved', 'involves']
rapnqa ['berate', 'debate', 'demote', 'denote', 'devote', 'hecate', 'negate', 'rebate', 'recite', 'refute', 'relate', 'remote', 'repute', 'sedate', 'semite', 'senate']
qawuad ['teamed', 'teared', 'teased', 'temper', 'tender', 'tensed', 'tenser', 'termed']
gqltajj []


 >DH.find_all_pattern_words(ass_ct)
kacqao ['betsey', 'nether', 'vetoed', 'vetoer', 'vetoes']
oagcqreapr []
oagcqreapr []
cpo ['tab', 'tag', 'tan', 'tap', 'tar', 'tau', 'tax', 'tim', 'tin', 'tip', 'tom', 'ton', 'top', 'tow', 'toy', 'try', 'tub', 'tug', 'two']
tlyyaka ['college']
tleeurraa []
eaemaqj ['members']
cpo ['tab', 'tag', 'tan', 'tap', 'tar', 'tau', 'tax', 'tim', 'tin', 'tip', 'tom', 'ton', 'top', 'tow', 'toy', 'try', 'tub', 'tug', 'two']
jrcii []
upwlywao ['bivalves', 'confined', 'confiner', 'confines', 'disposer', 'fairview', 'harbored', 'involved', 'involves']

 >
Now let's guessing cipher 'r' = plain 't':
 >DH.find_all_pattern_words(ass_rt)
rba ['the', 'tie', 'toe']
oagcqreapr ['department', 'deportment']
oagcqreapr ['department', 'deportment']
tlyyaka ['college']
tleeurraa ['committee']
eaemaqj ['members']
jrcii ['atoll', 'staff', 'stall', 'starr', 'stiff', 'still', 'stuff']
upwlywao ['bivalves', 'confined', 'confiner', 'confines', 'disposer', 'fairview', 'harbored', 'involved', 'involves']
rapnqa ['temple', 'tenure']
Points to note:
  1. Cipher 'rba': Only 3 3-letters that begin with 't' and end with 'e', one VERY common one. Good!
  2. Cipher 'oagcqreapr': Only 2 complex words. Very good. This cipher word repeats on the message.
  3. The next 3 cipher words all have only one compatible plain text word under this assignment. Perfect!
    tlyyaka ['college']
    tleeurraa ['committee']
    eaemaqj ['members']
    
    Notice any pattern developing here?
  4. Which of the following words continue the pattern?
    jrcii ['atoll', 'staff', 'stall', 'starr', 'stiff', 'still', 'stuff']
    
  5. Complete the decoding and choose the right words from the remaining lists:
    upwlywao ['bivalves', 'confined', 'confiner', 'confines', 'disposer', 'fairview', 'harbored', 'involved', 'involves']
    rapnqa ['temple', 'tenure']