A cipher is an algorithm that inputs a plaintext string and outputs a ciphertext string, according to some given key.
Mathematically, a substitution cipher is a one-to-one and onto function on the alphabet set \(X\).
\[ c : X \longrightarrow X \]
Suppose that a plaintext message is composed from an alphabet set \(X\). If \(c : X \longrightarrow X\) is a one-to-one correspondence, then \(c\) defines a cipher that can be applied to the plaintext. Such a cipher is a substitution cipher.
Examples:
The time required for a brute force attack (trying all the keys) is proportional to the number of possible keys.
Warm-up question: How many different keys are there for the following ciphers on \(\mathbb{Z}_{26}\)?
- The shift cipher \(x \mapsto x+h\), where \(h\) is a chosen element of \(\mathbb{Z}_{26}\).
- The affine cipher \(x \mapsto \alpha x + \beta\), where \(\alpha, \beta\) are chosen elements of \(\mathbb{Z}_{26}\).
- A general substitution cipher \(x \mapsto \sigma(x)\), where \(\sigma\) is a chosen permutation of the elements of \(\mathbb{Z}_{26}\).
A frequency analysis attack exploits the fact that, in any natural language, some letters are used more than others. For example, in English, the frequencies of each letter are as follows.
englishFreqs <- c(0.082,0.015,0.028,0.043,0.127,0.022,0.020,0.061,0.070, 0.002,0.008,0.040,0.024,
0.067,0.075,0.019,0.001,0.060,0.063,0.091,0.028,0.010,0.023,0.001,0.020,0.001)
names(englishFreqs) <- letters
sort(englishFreqs,decreasing=TRUE)
e t a o i n s h r d l c u m w f
0.127 0.091 0.082 0.075 0.070 0.067 0.063 0.061 0.060 0.043 0.040 0.028 0.028 0.024 0.023 0.022
g y p b v k j q x z
0.020 0.020 0.019 0.015 0.010 0.008 0.002 0.001 0.001 0.001
Suppose that the following ciphertext is the result of a substitution cipher:
[1] "rwqhthkxebrxihnbkhshlxibkhiirhqviewblhibxbobhwarxioiovmivuvjxbzcobbkoibxeurxsihmabwbrhpkwtxlhejhwauwlrhpobrximxahxebwrvfvklxebrhsveehkawmmwqxeuvelewqivxlrhixejhxbxikhiwmthlvsweuzwobrvbzwoqxmmlxhjwshwemhboijwssxbwoksobovmlhvbribwlhbhksxevbxweczmwbrhqrwsbrhmwbavmmibwaxkibmhbrxschdxmmhlczrxsbrvbrvbrbrhihjwelmwbvelbroiawkboehirvmmsvdhxbipkwukhiibrkwouroivmmewkirvmmvezwaoiphkxirczrxiwqekxurbrvelawkxbqwomlchoeavxkxaqrhebrhkhibvkhuwehiwshcwlzirwomlkhphebvelivthrxsihmabrxipkwpwivmvpphvkhlbwbrhsbwchthkzyoibvelqrherhrvlpkhtvxmhlqxbrbrhsbwlhbhksxehbrxisvbbhkczmwbirhlkhqwehwabrhmwbiawkrxsihmavmiwrhqrwrvlbrhaxkibmwbmvxlrxiehjdcvkhbwrxsbrvbrvlbrhehnbviioppwixeubrvbbrhuhehkvmqwomllxhvsweubrhsxsshlxvbhmzawkbrhzbrwourblhvbrxaywihproisxurbcoblxhqxbrbrhsqviiqhhbhkbrvemxahzhbqvirhqxbrvewbrhkmhabbwbrhmvibqrhbrhkqhsoibivzxbrvpphehliwczjrvejhwkqrhbrhkczbrhpkwtxlhejhwauwlvelvirhqvithkzlhixkwoiehxbrhkbwchjwelhsehlczbrhmwbewkxarhrvlchhemhabbwbrhmvibbwxsckohrxikxurbrvelxebrhcmwwlwarxijwoebkzsherhphkiovlhlrxsbwbkoibrxiaxlhmxbzbwrxsvelbwmxthviqhmmvirxsihma"
letterCounts <- function(txt)
{
return(sort(table(unlist(strsplit(txt,""))),decreasing=TRUE))
}
letterCounts(ciphertext)
h b r w x i v e k m l s o a q z c p u j t d n y f
130 114 94 80 72 71 65 53 50 50 47 34 32 27 24 20 18 17 16 12 9 3 2 2 1
Assuming that the plaintext is in English, what do you conclude?
[1] "rwqEtEkxeTrxiEnTkEsElxiTkEiirEqviewTlEiTxToTEwarxioiovmivuvjxTzcoTTkoiTxeurxsiEmaTwTrEpkwtxlEejEwauwlrEpoTrximxaExeTwrvfvklxeTrEsveeEkawmmwqxeuvelewqivxlrEixejExTxikEiwmtElvsweuzwoTrvTzwoqxmmlxEjwsEwemEToijwssxTwoksoTovmlEvTriTwlETEksxevTxweczmwTrEqrwsTrEmwTavmmiTwaxkiTmETrxscEdxmmElczrxsTrvTrvTrTrEiEjwelmwTvelTroiawkToeEirvmmsvdExTipkwukEiiTrkwouroivmmewkirvmmvezwaoipEkxirczrxiwqekxurTrvelawkxTqwomlcEoeavxkxaqrEeTrEkEiTvkEuweEiwsEcwlzirwomlkEpEeTvelivtErxsiEmaTrxipkwpwivmvppEvkElTwTrEsTwcEtEkzyoiTvelqrEerErvlpkEtvxmElqxTrTrEsTwlETEksxeETrxisvTTEkczmwTirElkEqweEwaTrEmwTiawkrxsiEmavmiwrEqrwrvlTrEaxkiTmwTmvxlrxieEjdcvkETwrxsTrvTrvlTrEeEnTviioppwixeuTrvTTrEuEeEkvmqwomllxEvsweuTrEsxssElxvTEmzawkTrEzTrwourTlEvTrxaywiEproisxurTcoTlxEqxTrTrEsqviiqEETEkTrvemxaEzETqvirEqxTrvewTrEkmEaTTwTrEmviTqrETrEkqEsoiTivzxTrvppEeEliwczjrvejEwkqrETrEkczTrEpkwtxlEejEwauwlvelvirEqvitEkzlEixkwoieExTrEkTwcEjwelEseElczTrEmwTewkxarErvlcEEemEaTTwTrEmviTTwxsckoErxikxurTrvelxeTrEcmwwlwarxijwoeTkzsEerEpEkiovlElrxsTwTkoiTrxiaxlEmxTzTwrxsvelTwmxtEviqEmmvirxsiEma"
digramTable <- function(txt)
# returns a table with the numbers of digrams of each possible type
{
l <- unlist(strsplit(txt,""))
dgs <- data.frame(l,c(l[2:length(l)],NA))
names(dgs) <- c("first","second")
table(dgs)
}
The most common digrams in English are TH HE IN ER AN RE ED ON ES.
second
first a b c d e f h i j k l m n o p q r s t u v w x y z
a 0 5 0 0 0 0 2 0 0 0 0 0 0 1 0 1 3 0 0 2 3 5 3 1 0
b 1 6 1 0 1 0 6 4 0 5 3 3 0 4 0 3 49 0 0 0 5 16 4 0 3
c 0 0 0 0 0 0 5 0 0 1 0 1 0 2 0 0 0 0 0 0 1 1 0 0 7
d 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
e 1 7 1 0 1 0 11 0 4 1 10 3 0 0 0 0 2 0 0 5 1 5 0 0 1
f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
h 3 10 2 1 10 0 2 10 4 16 10 12 2 1 6 7 4 8 3 2 5 6 4 0 2
i 3 14 0 0 3 0 7 4 2 2 0 1 0 4 3 2 9 2 1 0 6 5 3 0 0
j 0 0 0 1 0 0 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 5 1 0 0
k 1 4 2 0 0 0 11 4 0 0 1 1 0 3 0 2 1 3 0 0 1 6 7 0 3
l 1 5 4 0 1 0 11 2 0 2 1 1 0 0 1 2 4 0 0 0 3 1 7 0 1
m 4 0 0 0 1 0 6 3 0 0 5 8 0 0 0 1 0 1 1 0 6 9 4 0 1
n 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
o 0 6 0 0 3 0 1 11 0 1 0 3 0 0 1 1 0 0 0 2 3 0 0 0 0
p 0 0 0 0 0 0 5 0 0 5 0 0 0 1 3 0 1 0 0 0 0 2 0 0 0
q 0 0 0 0 1 0 4 1 0 0 0 0 0 0 0 0 6 0 0 0 4 3 5 0 0
r 0 7 1 0 0 0 38 1 0 1 0 0 0 3 0 0 0 0 0 0 18 5 20 0 0
s 0 6 2 0 1 0 5 4 0 0 0 0 0 2 0 1 0 2 0 0 4 2 5 0 0
t 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 0
u 0 2 0 0 0 0 1 0 0 1 0 0 0 0 0 0 6 0 0 0 2 3 0 0 1
v 0 10 0 1 13 1 0 10 1 4 5 9 0 0 2 0 0 2 1 1 0 0 4 0 1
w 7 14 3 0 8 0 0 3 0 7 6 3 0 10 1 4 5 4 2 1 0 1 1 0 0
x 5 12 0 0 10 0 3 13 0 4 5 3 0 0 0 0 0 11 1 3 1 1 0 0 0
y 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0
z 1 4 1 0 0 0 1 1 1 0 1 2 0 0 0 0 2 1 0 0 0 3 1 1 0
[1] "HwqEtERxeTHxiEnTREsElxiTREiiHEqviewTlEiTxToTEwaHxioiovmivuvjxTzcoTTRoiTxeuHxsiEmaTwTHEpRwtxlEejEwauwlHEpoTHximxaExeTwHvfvRlxeTHEsveeERawmmwqxeuvelewqivxlHEixejExTxiREiwmtElvsweuzwoTHvTzwoqxmmlxEjwsEwemEToijwssxTwoRsoTovmlEvTHiTwlETERsxevTxweczmwTHEqHwsTHEmwTavmmiTwaxRiTmETHxscEdxmmElczHxsTHvTHvTHTHEiEjwelmwTvelTHoiawRToeEiHvmmsvdExTipRwuREiiTHRwouHoivmmewRiHvmmvezwaoipERxiHczHxiwqeRxuHTHvelawRxTqwomlcEoeavxRxaqHEeTHEREiTvREuweEiwsEcwlziHwomlREpEeTvelivtEHxsiEmaTHxipRwpwivmvppEvRElTwTHEsTwcEtERzyoiTvelqHEeHEHvlpREtvxmElqxTHTHEsTwlETERsxeETHxisvTTERczmwTiHElREqweEwaTHEmwTiawRHxsiEmavmiwHEqHwHvlTHEaxRiTmwTmvxlHxieEjdcvRETwHxsTHvTHvlTHEeEnTviioppwixeuTHvTTHEuEeERvmqwomllxEvsweuTHEsxssElxvTEmzawRTHEzTHwouHTlEvTHxaywiEpHoisxuHTcoTlxEqxTHTHEsqviiqEETERTHvemxaEzETqviHEqxTHvewTHERmEaTTwTHEmviTqHETHERqEsoiTivzxTHvppEeEliwczjHvejEwRqHETHERczTHEpRwtxlEejEwauwlvelviHEqvitERzlEixRwoieExTHERTwcEjwelEseElczTHEmwTewRxaHEHvlcEEemEaTTwTHEmviTTwxscRoEHxiRxuHTHvelxeTHEcmwwlwaHxijwoeTRzsEeHEpERiovlElHxsTwTRoiTHxiaxlEmxTzTwHxsvelTwmxtEviqEmmviHxsiEma"
See R console.
A letter-frequency analysis is a Ciphertext only attack.
Some systems, like PassPack, encrypt data using a key based on a passphrase you provide. They don’t store the key/passphrase; they only store the encrypted data, so if you forget your passphrase you are out of luck.
In 1553, Giovan Battista Bellaso published a paper describing a passphrase-based encryption method.
The system was later attributed (falsely) to Blaise de Vigenère, so that’s what everyone calls it now.
The key is a passphrase, usually consisting of a word or two. For
example, mypassphrase
.
To encrypt, you line up the letters of the plaintext with repeated copies of the passphrase and add (mod 26).
thisistheplaintextthatwewouldliketoencryptusingthevigenerecipher
mypassphrasemypassphrasemypassphrasemypassphrasemypassphrasemypassphrase
t
corresponds to 19, m
corresponds to 12,
so the first letter of ciphertext corresponds to \((19+12) \bmod 26 = 5\), i.e.,
f
ffxsakiovpdeuliepliortoiimjlvdxrvtgizagyhljzznyxtckiywclieumbftr
Step 1: Take a moment to just think about the answers to these questions.
Step 2: Form groups of 2 or 3 and discuss your thoughts.
Step 3: Share what we discussed in a large group.
Passphrase: A vector \(\mathbf{v} = (k_1, k_2, \ldots, k_n) \in \mathbb{Z}_{26}^n\).
Plaintext: a vector \(\mathbf{p} \in \mathbb{Z}_{26}^{N}\), where \(N\) is the number of characters in the plaintext.
For simplicity, assume that \(N=nk\) for some \(k\). We could then break the plaintext \(\mathbf{p}\) into blocks \(\mathbf{p}_1, \mathbf{p}_2, \ldots, \mathbf{p}_k\), where each of these blocks \(\mathbf{p}_i\) is in \(\mathbb{Z}_{26}^n\).
Then the encryption process is represented by a function \(\mathbb{Z}_{26}^{n} \longrightarrow \mathbb{Z}_{26}^{n}\), given by \[ \mathbf{p}_i \longmapsto \mathbf{p}_i + \mathbf{v} \] where the addition is componentwise, modulo 26.
A cipher is called symmetric if it uses the same key to encrypt as to decrypt.
All of the ciphers we have seen so far are symmetric.
A block cipher is a cipher algorithm that operates on fixed-length sections, or blocks of plaintext.
The Vigenère cipher is an example of a symmetric block cipher. So are DES and AES.
stringToMod26 <- function(x) {utf8ToInt(x)-utf8ToInt("a")}
mod26ToString <- function(x) {intToUtf8(x+utf8ToInt("a"))}
vigenere <- function(txt, keyVector)
{
pt <- stringToMod26(txt)
suppressWarnings(
ct <- (pt + keyVector) %% 26
)
return(mod26ToString(ct))
}
plaintext <- "thisistheplaintextthatwewouldliketoencryptusingthevigenerecipher"
keyAsVector <- stringToMod26("mypassphrase")
vigenere(plaintext, keyAsVector)
[1] "ffxsakiovpdeuliepliortoiimjlvdxrvtgizagyhljzznyxtckiywclieumbftr"