The RSA Algorithm

September 27, 2022

Symmetry and Asymmetry

Public Key Cryptosystems

  • AES is great, but it is a symmetric key cryptosystem; it uses the same key to encrypt as to decrypt.
  • Suppose you have thousands of customers, and you want a secure way of collecting credit card information from them: if you want to use AES, you need a secure way of getting each of them a (different) key.
  • RSA is a public key cryptosystem. The key to encrypt is made public, but the key to decrypt is kept private.

Public key cryptosystems

Problem: Alice (e.g., a customer with a credit card) would like to send a secret message to Bob (e.g., an online store), but Alice and Bob have never met or exchanged private key information.

Public key cryptosystems

Problem: Alice (e.g., a customer with a credit card) would like to send a secret message to Bob (e.g., an online store), but Alice and Bob have never met or exchanged private key information.

Physical analogy: Lockbox and key

  • Alice asks Bob to send her an unlocked lockbox.
  • Bob sends the lockbock to Alice, leaves it unlocked, and keeps the key.
  • Alice puts her credit card number in the box, locks it, and sends it to Bob.
  • Bob receives the lockbox, unlocks it, and gets the credit card number.

In transit, Eve could be in possession of the lockbox, but would not get the credit card number.

Example: https (Source: tiptopsecurity.com)

HTTPS from tiptopsecurity image

Prime numbers

One-way functions

In the natural numbers \(\mathbb{N}\),

  • Given large primes \(p\) and \(q\), it is very easy to compute \(pq\).
  • Given \(n = pq\) where \(p\) and \(q\) are large primes, it is very difficult to compute \(p\) and \(q\).

Idea behind RSA: If you know \(pq\), you can encrypt a message, but you have to know \(p\) and \(q\) to decrypt the message.

Group Discussion

Without using technology, compute:

\(3^{100}\) in \(\mathbb{Z}_4\)

\(3^{100}\) in \(\mathbb{Z}_5\)

\(3^{100}\) in \(\mathbb{Z}_7\)

\(3^{100}\) in \(\mathbb{Z}_8\)

\(3^{100}\) in \(\mathbb{Z}_{10}\)

What patterns do you notice?

Powers of invertible elements in \(\mathbb{Z}_n\)

Recall, \(a\) is invertible in \(\mathbb{Z}_n\) if and only if \(\gcd(a,n)=1\).

Let \(u_1, u_2, \ldots, u_k\) be the distinct invertible elements of \(\mathbb{Z}_n\).

Let \(a\) be one of the invertible elements, and consider the following sequence in \(\mathbb{Z}_n\). \[ au_1, au_2, \ldots, au_k \]

  • Every element in this sequence is invertible. (Why?)
  • There will be no repeats in this sequence. (Why?)
  • The product of this sequence will equal \(u_1u_2\cdots u_k\). (Why?)

Euler’s Theorem

Let \(\phi(n)\) be the number of invertible elements in \(\mathbb{Z}_n\). We have proved the following theorem.

Euler’s Theorem: If \(\gcd(a,n)=1\), then \(a^{\phi(n)} \equiv 1 \pmod n\).

The function \(\phi\), which counts the number of elements that are relatively prime to \(n\), is called the Euler Phi Function or the Euler Totient Function.

Corollary: Fermat’s Little Theorem

  • Euler’s Theorem: If \(\gcd(a,n)=1\), then \(a^{\phi(n)} \equiv 1 \pmod n\).

  • Fermat’s Little Theorem: For \(p\) prime and \(a \neq 0\), \(a^{p-1} = 1\) in \(\mathbb{Z}_p\).

RSA

The RSA algorithm

  1. Bob chooses large primes \(p\) and \(q\) and computes \(n=pq\).
  2. Bob chooses exponent \(e\) relatively prime to \((p-1)(q-1)\).
  3. Bob computes \(d = e^{-1}\) modulo \((p-1)(q-1)\).
  4. Bob makes \(n\) and \(e\) public, but keeps \(p\), \(q\), and \(d\) secret.
  5. Alice codes a plaintext message as a number \(m\). She computes \(c = m^e \bmod n\) and sends \(c\) to Bob.
  6. Bob decrypts by computing \(m = c^d \bmod n\).

RSA step 1: choose \(p\) and \(q\) and compute \(n\)

p <- 885320963
q <- 238855417
n <- p*q
print(n)
[1] 2.114637e+17

Not good. We need to see more digits.

options(digits=20)
print(n)
[1] 211463707796206560

Still not good. (Why?)

RSA step 1: choose \(p\) and \(q\) and use bigz

library(gmp) # for big integers
p <- as.bigz("885320963")
q <- as.bigz("238855417")
n <- p*q
print(n)
Big Integer ('bigz') :
[1] 211463707796206571

RSA step 2: choose an exponent \(e\)

e <- 9007

We should check that \(e\) is relatively prime to \((p-1)(q-1)\).

gcd(e, (p-1)*(q-1))
[1] 1

RSA step 3: Compute \(d = e^{-1}\).

gcdex(e,(p-1)*(q-1)) # given in gmp package
Big Integer ('bigz') object of length 3:
[1] 1                  -95061235518491201 4049              

So we know that \(1 = -95061235518491201e + 4049(p-1)(q-1)\). Therefore set \(d = -95061235518491201 \bmod (p-1)(q-1)\) and check that \(de \equiv 1 \pmod {(p-1)(q-1)}\).

d <- mod.bigz(gcdex(e,(p-1)*(q-1))[2], (p-1)*(q-1))
cat(paste0("d = ", d, ", de mod (p-1)(q-1) = ", mod.bigz(d*e, (p-1)*(q-1)))) 
d = 116402471153538991, de mod (p-1)(q-1) = 1

RSA step 3: Compute \(d = e^{-1}\). (easy way)

inv.bigz(e,(p-1)*(q-1)) # given in gmp package
Big Integer ('bigz') :
[1] 116402471153538991

RSA step 4

Bob makes \(n\) and \(e\) public:

n = 211463707796206571
e = 9007

but keeps \(p\), \(q\), and \(d\) secret.

p = 885320963
q = 238855417
d = 116402471153538991

RSA step 5: code \(m\) and send \(c = m^e \bmod n\)

We would like to send the message “Cubs!”. Any text message can be coded into a single (big) integer. For example,

#' Convert string to big integer
#'
#' Converts a string to a bigz integer. First the characters of the string are converted to raw, then the
#' raw (hexadecimal) vector is converted to an integer, where the place values of this vector are assigned
#' from right to left.
stringToBigz <- function(txt) {
  nraw <- charToRaw(txt)
  l <- length(nraw)
  return(sum(as.bigz(256)^(l-(1:l))*as.numeric(nraw)))
}
stringToBigz("Cubs!")
Big Integer ('bigz') :
[1] 289732195105

RSA step 5: code \(m\) and send \(c = m^e \bmod n\)

m <- stringToBigz("Cubs!")
m
Big Integer ('bigz') :
[1] 289732195105

Note that this message better be smaller than \(n\). If not, we would use blocks.

Alice, who knows \(n\) and \(e\), which are public, sends \(c = m^e \bmod n\).

c <- mod.bigz(pow.bigz(m,e),n) # Warning: don't do it this way
c
Big Integer ('bigz') :
[1] 108464973573671852

RSA step 6: compute \(c^d \bmod n\)

Bob decrypts by computing \(m = c^d \bmod n\).

mod.bigz(pow.bigz(c,d),n) # danger!

Boom! R dies. (why?)

Fortunately, package gmp has a function for computing powers mod \(n\).

powm(c,d,n)
Big Integer ('bigz') :
[1] 289732195105

Convert integer back to text

#' Convert (some) big integers to strings
#'
#' This function is intended to serve as an inverse to the stringToBigz function.
#' If the raw representation of n contains 00's, this function will produce an
#' `embedded nul in string` error. Thus it is not suitable for all integers.
bigzToString <- function(n) {
  numbytes <- ceiling(log2.bigz(n)/8)
  nnumeric <- numeric(numbytes)
  for(i in 0:(numbytes-1)) {
    b <- as.numeric(mod.bigz(n, 256))
    n <- divq.bigz(n, 256)
    nnumeric[numbytes-i] <- b
  }
  return(rawToChar(as.raw(nnumeric)))
}
bigzToString(powm(c,d,n))
[1] "Cubs!"

Proof that RSA works: why we use \((p-1)(q-1)\)

Euler’s Theorem: If \(\gcd(a,n)=1\), then \(a^{\phi(n)} \equiv 1 \pmod n\).

In our case, \(n=pq\). How many numbers are relatively prime to \(pq\)?

  • \(p, 2p, 3p, \ldots, (q-1)p\) are not.
  • \(q, 2q, 3q, \ldots, (p-1)q\) are not.
  • Everything else less than \(pq\) is relatively prime to \(pq\).

\(pq-1 - (q-1) - (p -1) = pq - q - p + 1 = (p-1)(q-1)\).

So \(\phi(n)=(p-1)(q-1)\).

Proof that RSA works: how to recover \(m\)

When Bob decrypts, he computes:

\[ \begin{aligned} c^d \bmod n &= (m^e)^d \bmod n \\ &= m^{de} \bmod n \\ &= m^{1 + k(p-1)(q-1)} \bmod n\\ &= m \cdot \left(m^{(p-1)(q-1)}\right)^k \bmod n \\ & = m \cdot 1^k \\ &= m \end{aligned} \]

Practice on a “toy” example (groups)

  • Public: modulus: 77, encryption exponent: 13
  • Encrypt the plaintext 20.
  • Determine the decryption exponent.
  • Check that the decryption exponent works.
  • Would this example work if the encryption exponent were 3 instead of 13? Try it.

Some practical considerations

Computational cost of RSA?

  1. Bob chooses large primes \(p\) and \(q\) and computes \(n=pq\).
  2. Bob chooses exponent \(e\) relatively prime to \((p-1)(q-1)\).
  3. Bob computes \(d = e^{-1}\) modulo \((p-1)(q-1)\).
  4. Bob makes \(n\) and \(e\) public, but keeps \(p\), \(q\), and \(d\) secret.
  5. Alice codes a plaintext message as a number \(m\). She computes \(c = m^e \bmod n\) and sends \(c\) to Bob.
  6. Bob decrypts by computing \(m = c^d \bmod n\).

Fast exponentiation

rPower <- function(x, n)
{
  if(n==0) return(1)
  x*rPower(x, n-1)
}

qPower <- function(x, n)
{
  if(n==0) return(1)
  ifelse(n %% 2 == 0, (qPower(x, n/2))^2, x*(qPower(x, n %/% 2))^2)
}

For big integers, powm is some version of the latter.

Why does RSA use prime numbers?

Which step of the following derivation relies on \(p\) and \(q\) being prime?

\[ \begin{aligned} c^d \bmod n &= (m^e)^d \bmod n \\ &= m^{de} \bmod n \\ &= m^{1 + k(p-1)(q-1)} \bmod n\\ &= m \cdot \left(m^{(p-1)(q-1)}\right)^k \bmod n \\ & = m \cdot 1^k \\ &= m \end{aligned} \]

Basic prime number facts

  • A natural number \(n\) is prime if it has no divisors other than \(n\) and \(1\).
  • If a number is not prime, it is called composite.
  • If all of the prime numbers up to \(\sqrt{n}\) fail to divide \(n\), then \(n\) is prime.

Next time we’ll focus on how to deal with big prime numbers.

How big do the primes need to be?

  • Long enough so they can’t be factored.
  • Long enough so the blocks aren’t annoyingly small.
factorize("211463707796206571")
Big Integer ('bigz') object of length 2:
[1] 885320963 238855417

Factorization records?

During 2007-2009, a team of 10 researchers managed to factor one of the RSA challenge numbers:

rsaChallenge <- as.bigz("12301866845301177551304949583849627207728535695953347921973
                22452151726400507263657518745202199786469389956474942774063845925192
                55732630345373154826850791702612214291346167042921431160222124047927
                4737794080665351419597459856902143413")
log2(rsaChallenge)
[1] 767.66426718447132771
log10(rsaChallenge)
[1] 231.08997102193467299

Try factorize("rsaChallenge") and see what happens.

Message/block length can’t exceed modulus

m <- stringToBigz("I would like to send a message that could correspond 
                   to a reasonably complicated sentence.")
m
Big Integer ('bigz') :
[1] 2302676461579766653347587556562582796534277707538449316277229049517056349153249630994876086462097148341908250666470767397642251869455114162920307300064833324959498287098278560001109106614010440426795099511825790257024037759203639827225379811316485065533305670296878
log10(m) ## number of decimal digits in m
[1] 264.36223292154153341
log2(m) ## number of bits in m
[1] 878.19232876922569631

OpenSSL currently recommends 2048-bit moduli

key <- rsa_keygen()
rsaModulus <- as.list(key)$data$n
rsaModulus
[b] 22470653269527720927767058120048920177452679013001680983256294664369199576730430575564625037609439565195225316346146343302668466989061542725898271489790556354473720746879295642315251427129882850518845362950591998154701084609059045273802088855444740803090721174930235384316601355233845173601737114829839289560512712135213014843669446113051935271193245735165534751825528815722545956353430779189764213849162663991817290180843209446731029920220617643476233681552052387881579815247386501777643973830374374058141107189359787716925701063873548478083203377347393062830850659619868062866318392571287014629378365563820643650859
str(rsaModulus)
 'bignum' raw [1:257] 00 b2 00 79 ...

(See console.)