Substitution and Vigenère Ciphers

September 6, 2022

Substitution Ciphers

Recall: Ciphers

A cipher is an algorithm that inputs a plaintext string and outputs a ciphertext string, according to some given key.

Mathematically, a substitution cipher is a one-to-one and onto function on the alphabet set \(X\).

\[ c : X \longrightarrow X \]

Shift and Affine Ciphers

Suppose that a plaintext message is composed from an alphabet set \(X\). If \(c : X \longrightarrow X\) is a one-to-one correspondence, then \(c\) defines a cipher that can be applied to the plaintext. Such a cipher is a substitution cipher.

Examples:

Shift cipher
Affine cipher

Brute force attacks

The time required for a brute force attack (trying all the keys) is proportional to the number of possible keys.

Warm-up question: How many different keys are there for the following ciphers on \(\mathbb{Z}_{26}\)?

The shift cipher \(x \mapsto x+h\), where \(h\) is a chosen element of \(\mathbb{Z}_{26}\).

The affine cipher \(x \mapsto \alpha x + \beta\), where \(\alpha, \beta\) are chosen elements of \(\mathbb{Z}_{26}\).

A general substitution cipher \(x \mapsto \sigma(x)\), where \(\sigma\) is a chosen permutation of the elements of \(\mathbb{Z}_{26}\).

Letter-frequency attacks

A frequency analysis attack exploits the fact that, in any natural language, some letters are used more than others. For example, in English, the frequencies of each letter are as follows.

englishFreqs <- c(0.082,0.015,0.028,0.043,0.127,0.022,0.020,0.061,0.070, 0.002,0.008,0.040,0.024,
                  0.067,0.075,0.019,0.001,0.060,0.063,0.091,0.028,0.010,0.023,0.001,0.020,0.001)
names(englishFreqs) <- letters
sort(englishFreqs,decreasing=TRUE)

    e     t     a     o     i     n     s     h     r     d     l     c     u     m     w     f 
0.127 0.091 0.082 0.075 0.070 0.067 0.063 0.061 0.060 0.043 0.040 0.028 0.028 0.024 0.023 0.022 
    g     y     p     b     v     k     j     q     x     z 
0.020 0.020 0.019 0.015 0.010 0.008 0.002 0.001 0.001 0.001

Example

Suppose that the following ciphertext is the result of a substitution cipher:

ciphertext

[1] "rwqhthkxebrxihnbkhshlxibkhiirhqviewblhibxbobhwarxioiovmivuvjxbzcobbkoibxeurxsihmabwbrhpkwtxlhejhwauwlrhpobrximxahxebwrvfvklxebrhsveehkawmmwqxeuvelewqivxlrhixejhxbxikhiwmthlvsweuzwobrvbzwoqxmmlxhjwshwemhboijwssxbwoksobovmlhvbribwlhbhksxevbxweczmwbrhqrwsbrhmwbavmmibwaxkibmhbrxschdxmmhlczrxsbrvbrvbrbrhihjwelmwbvelbroiawkboehirvmmsvdhxbipkwukhiibrkwouroivmmewkirvmmvezwaoiphkxirczrxiwqekxurbrvelawkxbqwomlchoeavxkxaqrhebrhkhibvkhuwehiwshcwlzirwomlkhphebvelivthrxsihmabrxipkwpwivmvpphvkhlbwbrhsbwchthkzyoibvelqrherhrvlpkhtvxmhlqxbrbrhsbwlhbhksxehbrxisvbbhkczmwbirhlkhqwehwabrhmwbiawkrxsihmavmiwrhqrwrvlbrhaxkibmwbmvxlrxiehjdcvkhbwrxsbrvbrvlbrhehnbviioppwixeubrvbbrhuhehkvmqwomllxhvsweubrhsxsshlxvbhmzawkbrhzbrwourblhvbrxaywihproisxurbcoblxhqxbrbrhsqviiqhhbhkbrvemxahzhbqvirhqxbrvewbrhkmhabbwbrhmvibqrhbrhkqhsoibivzxbrvpphehliwczjrvejhwkqrhbrhkczbrhpkwtxlhejhwauwlvelvirhqvithkzlhixkwoiehxbrhkbwchjwelhsehlczbrhmwbewkxarhrvlchhemhabbwbrhmvibbwxsckohrxikxurbrvelxebrhcmwwlwarxijwoebkzsherhphkiovlhlrxsbwbkoibrxiaxlhmxbzbwrxsvelbwmxthviqhmmvirxsihma"

Count the letters

letterCounts <- function(txt)
{
  return(sort(table(unlist(strsplit(txt,""))),decreasing=TRUE))
}
letterCounts(ciphertext)


  h   b   r   w   x   i   v   e   k   m   l   s   o   a   q   z   c   p   u   j   t   d   n   y   f 
130 114  94  80  72  71  65  53  50  50  47  34  32  27  24  20  18  17  16  12   9   3   2   2   1

Assuming that the plaintext is in English, what do you conclude?

Make some substitutions

dct <- gsub("h","E",ciphertext)
dct <- gsub("b","T",dct)
dct

[1] "rwqEtEkxeTrxiEnTkEsElxiTkEiirEqviewTlEiTxToTEwarxioiovmivuvjxTzcoTTkoiTxeurxsiEmaTwTrEpkwtxlEejEwauwlrEpoTrximxaExeTwrvfvklxeTrEsveeEkawmmwqxeuvelewqivxlrEixejExTxikEiwmtElvsweuzwoTrvTzwoqxmmlxEjwsEwemEToijwssxTwoksoTovmlEvTriTwlETEksxevTxweczmwTrEqrwsTrEmwTavmmiTwaxkiTmETrxscEdxmmElczrxsTrvTrvTrTrEiEjwelmwTvelTroiawkToeEirvmmsvdExTipkwukEiiTrkwouroivmmewkirvmmvezwaoipEkxirczrxiwqekxurTrvelawkxTqwomlcEoeavxkxaqrEeTrEkEiTvkEuweEiwsEcwlzirwomlkEpEeTvelivtErxsiEmaTrxipkwpwivmvppEvkElTwTrEsTwcEtEkzyoiTvelqrEerErvlpkEtvxmElqxTrTrEsTwlETEksxeETrxisvTTEkczmwTirElkEqweEwaTrEmwTiawkrxsiEmavmiwrEqrwrvlTrEaxkiTmwTmvxlrxieEjdcvkETwrxsTrvTrvlTrEeEnTviioppwixeuTrvTTrEuEeEkvmqwomllxEvsweuTrEsxssElxvTEmzawkTrEzTrwourTlEvTrxaywiEproisxurTcoTlxEqxTrTrEsqviiqEETEkTrvemxaEzETqvirEqxTrvewTrEkmEaTTwTrEmviTqrETrEkqEsoiTivzxTrvppEeEliwczjrvejEwkqrETrEkczTrEpkwtxlEejEwauwlvelvirEqvitEkzlEixkwoieExTrEkTwcEjwelEseElczTrEmwTewkxarErvlcEEemEaTTwTrEmviTTwxsckoErxikxurTrvelxeTrEcmwwlwarxijwoeTkzsEerEpEkiovlElrxsTwTkoiTrxiaxlEmxTzTwrxsvelTwmxtEviqEmmvirxsiEma"

Look at digrams

digramTable <- function(txt)
  # returns a table with the numbers of digrams of each possible type
{
  l <- unlist(strsplit(txt,""))
  dgs <- data.frame(l,c(l[2:length(l)],NA))
  names(dgs) <- c("first","second")
  table(dgs)
}

The most common digrams in English are TH HE IN ER AN RE ED ON ES.

digramTable(ciphertext)

     second
first  a  b  c  d  e  f  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z
    a  0  5  0  0  0  0  2  0  0  0  0  0  0  1  0  1  3  0  0  2  3  5  3  1  0
    b  1  6  1  0  1  0  6  4  0  5  3  3  0  4  0  3 49  0  0  0  5 16  4  0  3
    c  0  0  0  0  0  0  5  0  0  1  0  1  0  2  0  0  0  0  0  0  1  1  0  0  7
    d  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
    e  1  7  1  0  1  0 11  0  4  1 10  3  0  0  0  0  2  0  0  5  1  5  0  0  1
    f  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
    h  3 10  2  1 10  0  2 10  4 16 10 12  2  1  6  7  4  8  3  2  5  6  4  0  2
    i  3 14  0  0  3  0  7  4  2  2  0  1  0  4  3  2  9  2  1  0  6  5  3  0  0
    j  0  0  0  1  0  0  4  0  0  0  0  0  0  0  0  0  1  0  0  0  0  5  1  0  0
    k  1  4  2  0  0  0 11  4  0  0  1  1  0  3  0  2  1  3  0  0  1  6  7  0  3
    l  1  5  4  0  1  0 11  2  0  2  1  1  0  0  1  2  4  0  0  0  3  1  7  0  1
    m  4  0  0  0  1  0  6  3  0  0  5  8  0  0  0  1  0  1  1  0  6  9  4  0  1
    n  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
    o  0  6  0  0  3  0  1 11  0  1  0  3  0  0  1  1  0  0  0  2  3  0  0  0  0
    p  0  0  0  0  0  0  5  0  0  5  0  0  0  1  3  0  1  0  0  0  0  2  0  0  0
    q  0  0  0  0  1  0  4  1  0  0  0  0  0  0  0  0  6  0  0  0  4  3  5  0  0
    r  0  7  1  0  0  0 38  1  0  1  0  0  0  3  0  0  0  0  0  0 18  5 20  0  0
    s  0  6  2  0  1  0  5  4  0  0  0  0  0  2  0  1  0  2  0  0  4  2  5  0  0
    t  0  0  0  0  0  0  6  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  2  0  0
    u  0  2  0  0  0  0  1  0  0  1  0  0  0  0  0  0  6  0  0  0  2  3  0  0  1
    v  0 10  0  1 13  1  0 10  1  4  5  9  0  0  2  0  0  2  1  1  0  0  4  0  1
    w  7 14  3  0  8  0  0  3  0  7  6  3  0 10  1  4  5  4  2  1  0  1  1  0  0
    x  5 12  0  0 10  0  3 13  0  4  5  3  0  0  0  0  0 11  1  3  1  1  0  0  0
    y  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  1  0  0  0
    z  1  4  1  0  0  0  1  1  1  0  1  2  0  0  0  0  2  1  0  0  0  3  1  1  0

Make more substitutions based on digrams

dct <- gsub("r", "H", dct)
dct <- gsub("k", "R", dct)
dct

[1] "HwqEtERxeTHxiEnTREsElxiTREiiHEqviewTlEiTxToTEwaHxioiovmivuvjxTzcoTTRoiTxeuHxsiEmaTwTHEpRwtxlEejEwauwlHEpoTHximxaExeTwHvfvRlxeTHEsveeERawmmwqxeuvelewqivxlHEixejExTxiREiwmtElvsweuzwoTHvTzwoqxmmlxEjwsEwemEToijwssxTwoRsoTovmlEvTHiTwlETERsxevTxweczmwTHEqHwsTHEmwTavmmiTwaxRiTmETHxscEdxmmElczHxsTHvTHvTHTHEiEjwelmwTvelTHoiawRToeEiHvmmsvdExTipRwuREiiTHRwouHoivmmewRiHvmmvezwaoipERxiHczHxiwqeRxuHTHvelawRxTqwomlcEoeavxRxaqHEeTHEREiTvREuweEiwsEcwlziHwomlREpEeTvelivtEHxsiEmaTHxipRwpwivmvppEvRElTwTHEsTwcEtERzyoiTvelqHEeHEHvlpREtvxmElqxTHTHEsTwlETERsxeETHxisvTTERczmwTiHElREqweEwaTHEmwTiawRHxsiEmavmiwHEqHwHvlTHEaxRiTmwTmvxlHxieEjdcvRETwHxsTHvTHvlTHEeEnTviioppwixeuTHvTTHEuEeERvmqwomllxEvsweuTHEsxssElxvTEmzawRTHEzTHwouHTlEvTHxaywiEpHoisxuHTcoTlxEqxTHTHEsqviiqEETERTHvemxaEzETqviHEqxTHvewTHERmEaTTwTHEmviTqHETHERqEsoiTivzxTHvppEeEliwczjHvejEwRqHETHERczTHEpRwtxlEejEwauwlvelviHEqvitERzlEixRwoieExTHERTwcEjwelEseElczTHEmwTewRxaHEHvlcEEemEaTTwTHEmviTTwxscRoEHxiRxuHTHvelxeTHEcmwwlwaHxijwoeTRzsEeHEpERiovlElHxsTwTRoiTHxiaxlEmxTzTwHxsvelTwmxtEviqEmmviHxsiEma"

Make more substitutions ad hoc

See R console.

Check your answer; see the key

letterCounts(ciphertext)
letterCounts(dct)

Types of Attack

A letter-frequency analysis is a Ciphertext only attack.

Ciphertext only attacks are always possible, because the ciphertext is what gets sent over public channels.
Known plaintext: If you have the plaintext and the ciphertext, how could you obtain the key?
Chosen plaintext/ciphertext attack:
- If you got access to the encryption (or decryption) machine for a substitution cipher, how could you obtain the key?

The Vigenère cipher

Passphrase encryption

Some systems, like PassPack, encrypt data using a key based on a passphrase you provide. They don’t store the key/passphrase; they only store the encrypted data, so if you forget your passphrase you are out of luck.

Vigenère cipher

In 1553, Giovan Battista Bellaso published a paper describing a passphrase-based encryption method.

The system was later attributed (falsely) to Blaise de Vigenère, so that’s what everyone calls it now.

How the Vigenère cipher works

The key is a passphrase, usually consisting of a word or two. For example, mypassphrase.

To encrypt, you line up the letters of the plaintext with repeated copies of the passphrase and add (mod 26).

thisistheplaintextthatwewouldliketoencryptusingthevigenerecipher
mypassphrasemypassphrasemypassphrasemypassphrasemypassphrasemypassphrase

t corresponds to 19, m corresponds to 12, so the first letter of ciphertext corresponds to \((19+12) \bmod 26 = 5\), i.e., f

ffxsakiovpdeuliepliortoiimjlvdxrvtgizagyhljzznyxtckiywclieumbftr

Discuss: Think/group/share:

How would you decrypt the Vigenère cipher, given the key?
Is the Vigenère cipher a substitution cipher? Why or why not?

Step 1: Take a moment to just think about the answers to these questions.

Discuss in groups of 2 or 3

How would you decrypt the Vigenère cipher, given the key?
Is the Vigenère cipher a substitution cipher? Why or why not?

Step 2: Form groups of 2 or 3 and discuss your thoughts.

Discuss as a large group

How would you decrypt the Vigenère cipher, given the key?
Is the Vigenère cipher a substitution cipher? Why or why not?

Step 3: Share what we discussed in a large group.

Vigenère cipher in math

Passphrase: A vector \(\mathbf{v} = (k_1, k_2, \ldots, k_n) \in \mathbb{Z}_{26}^n\).

Plaintext: a vector \(\mathbf{p} \in \mathbb{Z}_{26}^{N}\), where \(N\) is the number of characters in the plaintext.

For simplicity, assume that \(N=nk\) for some \(k\). We could then break the plaintext \(\mathbf{p}\) into blocks \(\mathbf{p}_1, \mathbf{p}_2, \ldots, \mathbf{p}_k\), where each of these blocks \(\mathbf{p}_i\) is in \(\mathbb{Z}_{26}^n\).

Then the encryption process is represented by a function \(\mathbb{Z}_{26}^{n} \longrightarrow \mathbb{Z}_{26}^{n}\), given by \[ \mathbf{p}_i \longmapsto \mathbf{p}_i + \mathbf{v} \] where the addition is componentwise, modulo 26.

Terminology

A cipher is called symmetric if it uses the same key to encrypt as to decrypt.

All of the ciphers we have seen so far are symmetric.

A block cipher is a cipher algorithm that operates on fixed-length sections, or blocks of plaintext.

The Vigenère cipher is an example of a symmetric block cipher. So are DES and AES.

Vigenère cipher in code

stringToMod26 <- function(x) {utf8ToInt(x)-utf8ToInt("a")}
mod26ToString <- function(x) {intToUtf8(x+utf8ToInt("a"))}

vigenere <- function(txt, keyVector)
{
  pt <- stringToMod26(txt)
  suppressWarnings( 
    ct <- (pt + keyVector) %% 26
  )                               
  return(mod26ToString(ct))
}

plaintext <- "thisistheplaintextthatwewouldliketoencryptusingthevigenerecipher"
keyAsVector <- stringToMod26("mypassphrase")
vigenere(plaintext, keyAsVector)

[1] "ffxsakiovpdeuliepliortoiimjlvdxrvtgizagyhljzznyxtckiywclieumbftr"

Written Assignment: Due Wednesday, 11:59pm

Substitution and Vigenère Ciphers