Attacks on Vigenère Ciphers

September 8, 2022

The Vigenère cipher

Recall: The Vigenère cipher

stringToMod26 <- function(x) {utf8ToInt(x)-utf8ToInt("a")}
mod26ToString <- function(x) {intToUtf8(x+utf8ToInt("a"))}

vigenere <- function(txt, keyVector)
{
  pt <- stringToMod26(txt)
  suppressWarnings( 
    ct <- (pt + keyVector) %% 26
  )                               
  return(mod26ToString(ct))
}

vigenere("albuquerque", stringToMod26("walter"))
[1] "wlmnularbni"

Brute force attack?

Suppose you knew the length \(n\) of the key for a Vigenère cipher. How many keys would you have to try for a ciphertext-only brute force attack?

Letter-frequency of Vigenère ciphertext

Will a letter-frequency attack work?

plaintext <- "howeverinthisextremedistresshewasnotdestituteofhisusualsagacitybuttrustinghimselftotheprovidenceofgodheputhislifeintohazardinthemannerfollowingandnowsaidhesinceitisresolvedamongyouthatyouwilldiecomeonletuscommitourmutualdeathstodeterminationbylothewhomthelotfallstofirstlethimbekilledbyhimthathaththesecondlotandthusfortuneshallmakeitsprogressthroughusallnorshallanyofusperishbyhisownrighthandforitwouldbeunfairifwhentherestaregonesomebodyshouldrepentandsavehimselfthisproposalappearedtothemtobeveryjustandwhenhehadprevailedwiththemtodeterminethismatterbylotshedrewoneofthelotsforhimselfalsohewhohadthefirstlotlaidhisneckbaretohimthathadthenextassupposingthatthegeneralwoulddieamongthemimmediatelyfortheythoughtdeathifjosephusmightbutdiewiththemwassweeterthanlifeyetwashewithanotherlefttothelastwhetherwemustsayithappenedsobychanceorwhetherbytheprovidenceofgodandashewasverydesirousneithertobecondemnedbythelotnorifhehadbeenlefttothelasttoimbruehisrighthandinthebloodofhiscountrymenhepersuadedhimtotrusthisfidelitytohimandtoliveaswellashimself"
keyAsVector <- stringToMod26("skyler")
ciphertext <- vigenere(plaintext, keyAsVector)
letterCounts(plaintext)

  e   t   h   o   i   s   a   n   l   r   d   m   u   f   w   y   b   p   g   c   v   k   j   x   z 
130 114  94  80  72  71  65  53  50  50  47  34  32  27  24  20  18  17  16  12   9   3   2   2   1 
letterCounts(ciphertext)

 s  l  e  r  d  y  w  c  j  z  v  k  o  f  m  p  x  g  i  q  t  n  a  h  b  u 
71 65 58 57 56 56 52 51 51 50 47 46 39 38 37 37 33 31 30 29 26 23 20 16 14 10 

Histogram of letter counts

Unbreakable?

See Scientific American Monthly, 4(4), 1921, pp. 332-334.

But before we proceed, let us return to the popular misapprehension, the delusion regarding the invulnerability of this system which is so firmly fixed that we come in contact with statements and articles describing it as “indecipherable” and some even refer to it as “new.”

I have in mind an article published in the Proceedings of the Engineers’ Club of Philadelphia and reprinted in the Scientific American Supplement (No. 2143) of January 17, 1917, entitled “A New Cipher Code” in which our old and well-known Vigenère table appears as the subject of an interesting if somewhat erroneous article.

The author’s closing paragraph is of especial interest and is quoted herewith “The method used for the preparation and reading of code messages is simple in the extreme and at the same time impossible of translation unless the key-word is known. The ease with which the key may be changed is another point in favor of the adoption of this code by those desiring to transmit important messages without the slightest danger of their messages being read by political or business rivals,” etc. The italics are ours!

Attacking the Vigenère cipher, step 1: Find the key length

Chosen plaintext

vigenere("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", stringToMod26("skyler"))
[1] "skylerskylerskylerskylerskylerskylerskyler"

Repeats

Observe: if all the letters of the plaintext are the same, the ciphertext will repeat every \(k\) letters, where \(k\) is the length of the key.

vigenere("eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee", stringToMod26("skyler"))
[1] "wocpivwocpivwocpivwocpivwocpivwocpivwocpiv"

Repeats

If follows that, if many of the letters of the plaintext are the same (e.g., e’s), there will be lots of repeats every \(k\) letters in the ciphertext (and not as many repeats for other intervals).

vigenere("eeereeeweeeeeerefeeegeeeedeeeeqeeeveeeseem", stringToMod26("skyler"))
[1] "woccivwgcpivwoppjvwoepivwncpiviocpzvwoqpid"
woccivwgcpivwoppjvwoepivwncpiviocpzvwoqpid        # ciphertext
      woccivwgcpivwoppjvwoepivwncpiviocpzvwoqpid  # shifted by 6
      * * ***  * *** * **  ***  ** * * *          # matches

Therefore, to find the key length, you just find the shift that maximizes the number of matches.

R technique: subscripting

(See console.)

Tools for shifting

stringToMod26 <- function(x) {utf8ToInt(x)-utf8ToInt("a")}

shiftVec <- function(v, n) {
  v[(seq_along(v) - (n+1)) %% length(v) + 1]
}

stringToMod26("tucosalamanca")
 [1] 19 20  2 14 18  0 11  0 12  0 13  2  0
shiftVec(stringToMod26("tucosalamanca"), 3)
 [1] 13  2  0 19 20  2 14 18  0 11  0 12  0

R technique: Boolean vectors and sum

(See console.)

Count matches

ctv <- stringToMod26(ciphertext)
matches <- sapply(1:20, function(x){sum(ctv-shiftVec(ctv, x)==0)})
names(matches) <- 1:20
matches
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
38 51 45 43 44 73 47 40 39 41 42 58 45 43 37 40 37 73 53 33 

Plot matches

Count matches in shifts of plaintext

ptv <- stringToMod26(plaintext)
matches <- sapply(1:20, function(x){sum(ptv-shiftVec(ptv, x)==0)})
names(matches) <- 1:20
matches
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
28 48 64 63 65 73 63 75 69 80 73 58 68 61 61 70 83 72 56 89 

Plot matches in shifts of plaintext

Does it work without letter frequency patterns?

set.seed(60302)
nfPlaintext <- mod26ToString(sample(0:25, 1043, replace=TRUE)) # random ciphertext
nfCiphertext <- vigenere(nfPlaintext, keyAsVector)
ctv <- stringToMod26(nfCiphertext)
matches <- sapply(1:20, function(x){sum(ctv-shiftVec(ctv, x)==0)})
names(matches) <- 1:20
matches
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
56 34 33 39 41 34 46 33 52 39 39 44 34 42 36 31 33 45 42 43 

Does it work without letter frequency patterns?

Summary: finding the key length

Summary. To find the key length of a Vigenère cipher given only the ciphertext, count the matches between the ciphertext and shifts of the ciphertext, and find the (smallest) shift that maximizes the number of matches.

Can we systemetize this process?

ctv <- stringToMod26(ciphertext)
matches <- sapply(1:30, function(x){sum(ctv-shiftVec(ctv, x)==0)})
matches
 [1] 38 51 45 43 44 73 47 40 39 41 42 58 45 43 37 40 37 73 53 33 40 31 50 80 40 39 51 36 51 68

We can view every 6th entry of the vector matches:

matches[c(6,12,18,24,30)]
[1] 73 58 73 80 68

Or better yet:

matches[seq(6,30,by=6)]
[1] 73 58 73 80 68

Find the shift that maximizes mean matches

for(i in 1:10) {
  matches_by_i <- matches[seq(i,30,by=i)]
  cat("by", i, ":", matches_by_i, "mean:", mean(matches_by_i), "\n")
}
by 1 : 38 51 45 43 44 73 47 40 39 41 42 58 45 43 37 40 37 73 53 33 40 31 50 80 40 39 51 36 51 68 mean: 46.93333 
by 2 : 51 43 73 40 41 58 43 40 73 33 31 80 39 36 68 mean: 49.93333 
by 3 : 45 73 39 58 37 73 40 80 51 68 mean: 56.4 
by 4 : 43 40 58 40 33 80 36 mean: 47.14286 
by 5 : 44 41 37 33 40 68 mean: 43.83333 
by 6 : 73 58 73 80 68 mean: 70.4 
by 7 : 47 43 40 36 mean: 41.5 
by 8 : 40 40 80 mean: 53.33333 
by 9 : 39 73 51 mean: 54.33333 
by 10 : 41 33 68 mean: 47.33333 

Avoid for loops

sapply(1:10, function(i) {mean(matches[seq(i,30,by=i)])})
 [1] 46.93333 49.93333 56.40000 47.14286 43.83333 70.40000 41.50000 53.33333 54.33333 47.33333

And in one line, we can find which one is the max:

which.max(sapply(1:10, function(i) {mean(matches[seq(i,30,by=i)])}))
[1] 6

So the key length is 6 (probably).

Attacking the Vigenère cipher, step 2: Find the key, given its length

We can find the key length. Now what?

  • Brute force attack.
  • ?

New Example

ciphertext <- "fsbvoiutlbsjienutpwcdlmvlxnrmhfnjcmmiftrnnghjrsznebwmuhkdkaimgrlvtcawzbekxwvrzieplkwkhjetzryipopxbanwddejnmfhwtthzubadphrnyubvxthfmzzhaniiwuztmfdxlrytwnaryaujpamqhrslpvabeqeawlvzxmucdlmvsxqrqtbiilhdusqftowzcwyhyegxhpdlddiiowcymrvomjngohpeerjtrwigpometgwrdnhvbbdziylsobtihwddenbndisuwoabacuzpedwcbjgmiznowlmpiiyiiovmptavtztrbljuvrzvmwryzeortbiheazkziwsqiqlgbgmfdxlryrmvyqnjjtycgceyvxyjzpfdxwvyrptrlwxkwcim"
ctv <- stringToMod26(ciphertext)
matches <- sapply(1:30, function(x){sum(ctv-shiftVec(ctv, x)==0)})
names(matches) <- 1:30
matches
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
15 16 12  6 12 13 16 16 27 16 15 20  9 17 11 19 23 30 17 13 12 16 15 12 17 17 26  9 16 19 

So it looks like the key length is \(k=9\).

Finding the key given the key length

If the key length is \(k\), then the letters in position \(1, k+1, 2k+1, 3k+1, \ldots\) of the ciphertext should all have been shifted the same amount. (Similarly for positions \(2, k+2, 2k+2, 3k+2, \ldots\), and so on.)

In our example, we found that \(k=9\).

ct <- stringToMod26(ciphertext)
every9thct <- ct[seq(1,length(ctv), 9)]  # take every 9th element, starting with the first 
letterCounts(mod26ToString(every9thct))

r d h l w q u i n p b e f g k o x y z 
6 5 5 4 4 3 3 2 2 2 1 1 1 1 1 1 1 1 1 

Mathematical tool: Dot products

Recall: For vectors \(\mathbf{a}, \mathbf{b} \in \mathbb{R}^n\), \(\mathbf{a} \cdot \mathbf{b} = \lVert \mathbf{a} \rVert \lVert \mathbf{b} \rVert \cos \theta\), where \(\theta\) is the angle between \(\mathbf{a}\) and \(\mathbf{b}\). Therefore, as the angle changes, the dot product is greatest when the vectors point in the same direction.

Consequence: Given a collection of equal-length vectors \(\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_n\) and a vector \(\mathbf{v}\), the vector \(\mathbf{u}_i\) that maximizes the dot product \(\mathbf{u}_i \cdot \mathbf{v}\) will be the vector whose direction is closest to the direction of \(\mathbf{v}\).

Big Idea: Match letter frequencies

  • Let \(\mathbf{v}\) be the letter frequencies of the letters in position \(1, k+1, 2k+1, 3k+1, \ldots\) of the ciphertext, where \(k\) is the key length.
    • These letter frequencies should roughly match English letter frequencies, except they have been shifted by the first letter of the key.
  • Let \(\mathbf{u}_0\) be the vector of English letter frequencies (of a, b, c, …).
  • If \(\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_{25}\) are all the possible shifts of \(\mathbf{u}_0\), then the \(\mathbf{u}_i\) that maximizes \(\mathbf{u}_i \cdot \mathbf{v}\) is the most likely shift.
  • That is, the first element of the key is \(i\).
  • Repeat. For the \(p\)th element of the key, compute \(\mathbf{v}\) using the letters in position \(p, k+p, 2k+p, 3k+p, \ldots\).

R tools we need

Given a string txt and integers \(n\) and \(r\), returns a string consisting of the characters in positions \(n, n+r, n+2r, n+3r,\ldots\), etc.

skipString <- function(txt, n, r) {
  l <- unlist(strsplit(txt,""))
  ss <- l[seq(n,length(l),r)]
  return(paste0(ss,collapse=""))
}

Compute the relative frequencies of the letters in a string of lowercase letters.

letterFreq <- function(txt) {
  l <- unlist(c(strsplit(txt,""),letters))
  t <- as.vector(table(l))-1
  return(t/sum(t))
}

Which shift of English frequencies matches best?

englishFreqs <- c(0.082,0.015,0.028,0.043,0.127,0.022,0.020,0.061,0.070, 0.002,0.008,0.040,0.024,
                  0.067,0.075,0.019,0.001,0.060,0.063,0.091,0.028,0.010,0.023,0.001,0.020,0.001)
v <- letterFreq(skipString(ciphertext,1,9))
for(i in 0:25) {
  cat(paste("shift =", i, "\t", " v dot u_i =", v %*% shiftVec(englishFreqs, i), "\n"))
}
shift = 0     v dot u_i = 0.0404444444444444 
shift = 1     v dot u_i = 0.0301111111111111 
shift = 2     v dot u_i = 0.0314222222222222 
shift = 3     v dot u_i = 0.0643555555555556 
shift = 4     v dot u_i = 0.0403111111111111 
shift = 5     v dot u_i = 0.0322222222222222 
shift = 6     v dot u_i = 0.0303777777777778 
shift = 7     v dot u_i = 0.0411555555555555 
shift = 8     v dot u_i = 0.0314222222222222 
shift = 9     v dot u_i = 0.0415111111111111 
shift = 10    v dot u_i = 0.0383777777777778 
shift = 11    v dot u_i = 0.0391555555555555 
shift = 12    v dot u_i = 0.0366888888888889 
shift = 13    v dot u_i = 0.0427555555555556 
shift = 14    v dot u_i = 0.0371111111111111 
shift = 15    v dot u_i = 0.0419111111111111 
shift = 16    v dot u_i = 0.0426222222222222 
shift = 17    v dot u_i = 0.0348222222222222 
shift = 18    v dot u_i = 0.0377333333333333 
shift = 19    v dot u_i = 0.0352666666666667 
shift = 20    v dot u_i = 0.0383111111111111 
shift = 21    v dot u_i = 0.0310888888888889 
shift = 22    v dot u_i = 0.0397555555555556 
shift = 23    v dot u_i = 0.0357333333333333 
shift = 24    v dot u_i = 0.0391777777777778 
shift = 25    v dot u_i = 0.0471555555555556 

Which shift of English frequencies matches best?

englishFreqs <- c(0.082,0.015,0.028,0.043,0.127,0.022,0.020,0.061,0.070, 0.002,0.008,0.040,0.024,
                  0.067,0.075,0.019,0.001,0.060,0.063,0.091,0.028,0.010,0.023,0.001,0.020,0.001)
v <- letterFreq(skipString(ciphertext,1,9))
matchFreqs <- sapply(0:25, function(i){v %*% shiftVec(englishFreqs, i)})
matchFreqs
 [1] 0.04044444 0.03011111 0.03142222 0.06435556 0.04031111 0.03222222 0.03037778 0.04115556
 [9] 0.03142222 0.04151111 0.03837778 0.03915556 0.03668889 0.04275556 0.03711111 0.04191111
[17] 0.04262222 0.03482222 0.03773333 0.03526667 0.03831111 0.03108889 0.03975556 0.03573333
[25] 0.03917778 0.04715556
which.max(matchFreqs)-1
[1] 3

So the first element of the key is (probably) d:3.

What about the rest of the key?

vKey <- numeric(9) # preallocate a vector length 9
for(p in 1:9) {
  v <- letterFreq(skipString(ciphertext,p,9))
  matchFreqs <- sapply(0:25, function(i){v %*% shiftVec(englishFreqs, i)})
  vKey[p] <- which.max(matchFreqs)-1
}
mod26ToString(vKey) # print out the key
[1] "depravity"

Decrypt, using the key

vigenere(ciphertext, -vKey)
[1] "comeonmanyouresmartyoumadepoisonoutofbeansyolookwegotwegotanentirelabrightherealrighthowaboutyoupicksomeofthesechemicalsandmixupsomerocketfuelthatwayyoucouldjustsendupasignalflareoryoumakesomekindofrobottogetushelporahomingdeviceorbuildanewbatteryorwaitnowhatifwejusttakesomestuffoffofthervandbuilditintosomethingcompletelydifferentyouknowlikealikeadunebuggythatwaywecanjustdunebuggyorwhatheywhatisitwhat"

Programming Assignment, Due Sunday night

Written Assignment, Due Monday night