stringToMod26 <- function(x) {utf8ToInt(x)-utf8ToInt("a")}
mod26ToString <- function(x) {intToUtf8(x+utf8ToInt("a"))}
vigenere <- function(txt, keyVector)
{
pt <- stringToMod26(txt)
suppressWarnings(
ct <- (pt + keyVector) %% 26
)
return(mod26ToString(ct))
}
vigenere("albuquerque", stringToMod26("walter"))
[1] "wlmnularbni"
Suppose you knew the length \(n\) of the key for a Vigenère cipher. How many keys would you have to try for a ciphertext-only brute force attack?
Will a letter-frequency attack work?
plaintext <- "howeverinthisextremedistresshewasnotdestituteofhisusualsagacitybuttrustinghimselftotheprovidenceofgodheputhislifeintohazardinthemannerfollowingandnowsaidhesinceitisresolvedamongyouthatyouwilldiecomeonletuscommitourmutualdeathstodeterminationbylothewhomthelotfallstofirstlethimbekilledbyhimthathaththesecondlotandthusfortuneshallmakeitsprogressthroughusallnorshallanyofusperishbyhisownrighthandforitwouldbeunfairifwhentherestaregonesomebodyshouldrepentandsavehimselfthisproposalappearedtothemtobeveryjustandwhenhehadprevailedwiththemtodeterminethismatterbylotshedrewoneofthelotsforhimselfalsohewhohadthefirstlotlaidhisneckbaretohimthathadthenextassupposingthatthegeneralwoulddieamongthemimmediatelyfortheythoughtdeathifjosephusmightbutdiewiththemwassweeterthanlifeyetwashewithanotherlefttothelastwhetherwemustsayithappenedsobychanceorwhetherbytheprovidenceofgodandashewasverydesirousneithertobecondemnedbythelotnorifhehadbeenlefttothelasttoimbruehisrighthandinthebloodofhiscountrymenhepersuadedhimtotrusthisfidelitytohimandtoliveaswellashimself"
keyAsVector <- stringToMod26("skyler")
ciphertext <- vigenere(plaintext, keyAsVector)
letterCounts(plaintext)
e t h o i s a n l r d m u f w y b p g c v k j x z
130 114 94 80 72 71 65 53 50 50 47 34 32 27 24 20 18 17 16 12 9 3 2 2 1
s l e r d y w c j z v k o f m p x g i q t n a h b u
71 65 58 57 56 56 52 51 51 50 47 46 39 38 37 37 33 31 30 29 26 23 20 16 14 10
See Scientific American Monthly, 4(4), 1921, pp. 332-334.
But before we proceed, let us return to the popular misapprehension, the delusion regarding the invulnerability of this system which is so firmly fixed that we come in contact with statements and articles describing it as “indecipherable” and some even refer to it as “new.”
I have in mind an article published in the Proceedings of the Engineers’ Club of Philadelphia and reprinted in the Scientific American Supplement (No. 2143) of January 17, 1917, entitled “A New Cipher Code” in which our old and well-known Vigenère table appears as the subject of an interesting if somewhat erroneous article.
The author’s closing paragraph is of especial interest and is quoted herewith “The method used for the preparation and reading of code messages is simple in the extreme and at the same time impossible of translation unless the key-word is known. The ease with which the key may be changed is another point in favor of the adoption of this code by those desiring to transmit important messages without the slightest danger of their messages being read by political or business rivals,” etc. The italics are ours!
[1] "skylerskylerskylerskylerskylerskylerskyler"
Observe: if all the letters of the plaintext are the same, the ciphertext will repeat every \(k\) letters, where \(k\) is the length of the key.
[1] "wocpivwocpivwocpivwocpivwocpivwocpivwocpiv"
If follows that, if many of the letters of the plaintext are the same
(e.g., e
’s), there will be lots of repeats every \(k\) letters in the ciphertext (and not as
many repeats for other intervals).
[1] "woccivwgcpivwoppjvwoepivwncpiviocpzvwoqpid"
woccivwgcpivwoppjvwoepivwncpiviocpzvwoqpid # ciphertext
woccivwgcpivwoppjvwoepivwncpiviocpzvwoqpid # shifted by 6
* * *** * *** * ** *** ** * * * # matches
Therefore, to find the key length, you just find the shift that maximizes the number of matches.
(See console.)
stringToMod26 <- function(x) {utf8ToInt(x)-utf8ToInt("a")}
shiftVec <- function(v, n) {
v[(seq_along(v) - (n+1)) %% length(v) + 1]
}
stringToMod26("tucosalamanca")
[1] 19 20 2 14 18 0 11 0 12 0 13 2 0
[1] 13 2 0 19 20 2 14 18 0 11 0 12 0
sum
(See console.)
ctv <- stringToMod26(ciphertext)
matches <- sapply(1:20, function(x){sum(ctv-shiftVec(ctv, x)==0)})
names(matches) <- 1:20
matches
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
38 51 45 43 44 73 47 40 39 41 42 58 45 43 37 40 37 73 53 33
ptv <- stringToMod26(plaintext)
matches <- sapply(1:20, function(x){sum(ptv-shiftVec(ptv, x)==0)})
names(matches) <- 1:20
matches
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
28 48 64 63 65 73 63 75 69 80 73 58 68 61 61 70 83 72 56 89
set.seed(60302)
nfPlaintext <- mod26ToString(sample(0:25, 1043, replace=TRUE)) # random ciphertext
nfCiphertext <- vigenere(nfPlaintext, keyAsVector)
ctv <- stringToMod26(nfCiphertext)
matches <- sapply(1:20, function(x){sum(ctv-shiftVec(ctv, x)==0)})
names(matches) <- 1:20
matches
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
56 34 33 39 41 34 46 33 52 39 39 44 34 42 36 31 33 45 42 43
Summary. To find the key length of a Vigenère cipher given only the ciphertext, count the matches between the ciphertext and shifts of the ciphertext, and find the (smallest) shift that maximizes the number of matches.
ctv <- stringToMod26(ciphertext)
matches <- sapply(1:30, function(x){sum(ctv-shiftVec(ctv, x)==0)})
matches
[1] 38 51 45 43 44 73 47 40 39 41 42 58 45 43 37 40 37 73 53 33 40 31 50 80 40 39 51 36 51 68
We can view every 6th entry of the vector matches
:
[1] 73 58 73 80 68
Or better yet:
[1] 73 58 73 80 68
for(i in 1:10) {
matches_by_i <- matches[seq(i,30,by=i)]
cat("by", i, ":", matches_by_i, "mean:", mean(matches_by_i), "\n")
}
by 1 : 38 51 45 43 44 73 47 40 39 41 42 58 45 43 37 40 37 73 53 33 40 31 50 80 40 39 51 36 51 68 mean: 46.93333
by 2 : 51 43 73 40 41 58 43 40 73 33 31 80 39 36 68 mean: 49.93333
by 3 : 45 73 39 58 37 73 40 80 51 68 mean: 56.4
by 4 : 43 40 58 40 33 80 36 mean: 47.14286
by 5 : 44 41 37 33 40 68 mean: 43.83333
by 6 : 73 58 73 80 68 mean: 70.4
by 7 : 47 43 40 36 mean: 41.5
by 8 : 40 40 80 mean: 53.33333
by 9 : 39 73 51 mean: 54.33333
by 10 : 41 33 68 mean: 47.33333
for
loops [1] 46.93333 49.93333 56.40000 47.14286 43.83333 70.40000 41.50000 53.33333 54.33333 47.33333
And in one line, we can find which one is the max:
[1] 6
So the key length is 6 (probably).
ciphertext <- "fsbvoiutlbsjienutpwcdlmvlxnrmhfnjcmmiftrnnghjrsznebwmuhkdkaimgrlvtcawzbekxwvrzieplkwkhjetzryipopxbanwddejnmfhwtthzubadphrnyubvxthfmzzhaniiwuztmfdxlrytwnaryaujpamqhrslpvabeqeawlvzxmucdlmvsxqrqtbiilhdusqftowzcwyhyegxhpdlddiiowcymrvomjngohpeerjtrwigpometgwrdnhvbbdziylsobtihwddenbndisuwoabacuzpedwcbjgmiznowlmpiiyiiovmptavtztrbljuvrzvmwryzeortbiheazkziwsqiqlgbgmfdxlryrmvyqnjjtycgceyvxyjzpfdxwvyrptrlwxkwcim"
ctv <- stringToMod26(ciphertext)
matches <- sapply(1:30, function(x){sum(ctv-shiftVec(ctv, x)==0)})
names(matches) <- 1:30
matches
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
15 16 12 6 12 13 16 16 27 16 15 20 9 17 11 19 23 30 17 13 12 16 15 12 17 17 26 9 16 19
So it looks like the key length is \(k=9\).
If the key length is \(k\), then the letters in position \(1, k+1, 2k+1, 3k+1, \ldots\) of the ciphertext should all have been shifted the same amount. (Similarly for positions \(2, k+2, 2k+2, 3k+2, \ldots\), and so on.)
In our example, we found that \(k=9\).
ct <- stringToMod26(ciphertext)
every9thct <- ct[seq(1,length(ctv), 9)] # take every 9th element, starting with the first
letterCounts(mod26ToString(every9thct))
r d h l w q u i n p b e f g k o x y z
6 5 5 4 4 3 3 2 2 2 1 1 1 1 1 1 1 1 1
Recall: For vectors \(\mathbf{a}, \mathbf{b} \in \mathbb{R}^n\), \(\mathbf{a} \cdot \mathbf{b} = \lVert \mathbf{a} \rVert \lVert \mathbf{b} \rVert \cos \theta\), where \(\theta\) is the angle between \(\mathbf{a}\) and \(\mathbf{b}\). Therefore, as the angle changes, the dot product is greatest when the vectors point in the same direction.
Consequence: Given a collection of equal-length vectors \(\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_n\) and a vector \(\mathbf{v}\), the vector \(\mathbf{u}_i\) that maximizes the dot product \(\mathbf{u}_i \cdot \mathbf{v}\) will be the vector whose direction is closest to the direction of \(\mathbf{v}\).
Given a string txt
and integers \(n\) and \(r\), returns a string consisting of the
characters in positions \(n, n+r, n+2r,
n+3r,\ldots\), etc.
skipString <- function(txt, n, r) {
l <- unlist(strsplit(txt,""))
ss <- l[seq(n,length(l),r)]
return(paste0(ss,collapse=""))
}
Compute the relative frequencies of the letters in a string of lowercase letters.
englishFreqs <- c(0.082,0.015,0.028,0.043,0.127,0.022,0.020,0.061,0.070, 0.002,0.008,0.040,0.024,
0.067,0.075,0.019,0.001,0.060,0.063,0.091,0.028,0.010,0.023,0.001,0.020,0.001)
v <- letterFreq(skipString(ciphertext,1,9))
for(i in 0:25) {
cat(paste("shift =", i, "\t", " v dot u_i =", v %*% shiftVec(englishFreqs, i), "\n"))
}
shift = 0 v dot u_i = 0.0404444444444444
shift = 1 v dot u_i = 0.0301111111111111
shift = 2 v dot u_i = 0.0314222222222222
shift = 3 v dot u_i = 0.0643555555555556
shift = 4 v dot u_i = 0.0403111111111111
shift = 5 v dot u_i = 0.0322222222222222
shift = 6 v dot u_i = 0.0303777777777778
shift = 7 v dot u_i = 0.0411555555555555
shift = 8 v dot u_i = 0.0314222222222222
shift = 9 v dot u_i = 0.0415111111111111
shift = 10 v dot u_i = 0.0383777777777778
shift = 11 v dot u_i = 0.0391555555555555
shift = 12 v dot u_i = 0.0366888888888889
shift = 13 v dot u_i = 0.0427555555555556
shift = 14 v dot u_i = 0.0371111111111111
shift = 15 v dot u_i = 0.0419111111111111
shift = 16 v dot u_i = 0.0426222222222222
shift = 17 v dot u_i = 0.0348222222222222
shift = 18 v dot u_i = 0.0377333333333333
shift = 19 v dot u_i = 0.0352666666666667
shift = 20 v dot u_i = 0.0383111111111111
shift = 21 v dot u_i = 0.0310888888888889
shift = 22 v dot u_i = 0.0397555555555556
shift = 23 v dot u_i = 0.0357333333333333
shift = 24 v dot u_i = 0.0391777777777778
shift = 25 v dot u_i = 0.0471555555555556
englishFreqs <- c(0.082,0.015,0.028,0.043,0.127,0.022,0.020,0.061,0.070, 0.002,0.008,0.040,0.024,
0.067,0.075,0.019,0.001,0.060,0.063,0.091,0.028,0.010,0.023,0.001,0.020,0.001)
v <- letterFreq(skipString(ciphertext,1,9))
matchFreqs <- sapply(0:25, function(i){v %*% shiftVec(englishFreqs, i)})
matchFreqs
[1] 0.04044444 0.03011111 0.03142222 0.06435556 0.04031111 0.03222222 0.03037778 0.04115556
[9] 0.03142222 0.04151111 0.03837778 0.03915556 0.03668889 0.04275556 0.03711111 0.04191111
[17] 0.04262222 0.03482222 0.03773333 0.03526667 0.03831111 0.03108889 0.03975556 0.03573333
[25] 0.03917778 0.04715556
[1] 3
So the first element of the key is (probably) d:3
.
vKey <- numeric(9) # preallocate a vector length 9
for(p in 1:9) {
v <- letterFreq(skipString(ciphertext,p,9))
matchFreqs <- sapply(0:25, function(i){v %*% shiftVec(englishFreqs, i)})
vKey[p] <- which.max(matchFreqs)-1
}
mod26ToString(vKey) # print out the key
[1] "depravity"
[1] "comeonmanyouresmartyoumadepoisonoutofbeansyolookwegotwegotanentirelabrightherealrighthowaboutyoupicksomeofthesechemicalsandmixupsomerocketfuelthatwayyoucouldjustsendupasignalflareoryoumakesomekindofrobottogetushelporahomingdeviceorbuildanewbatteryorwaitnowhatifwejusttakesomestuffoffofthervandbuilditintosomethingcompletelydifferentyouknowlikealikeadunebuggythatwaywecanjustdunebuggyorwhatheywhatisitwhat"