| letter | b | l | a | c | k | h | w | s |
| codeword | 01 | 11 | 0010 | 0001 | 0011 | 0000 | 010 | 011 |
Since 01 is a prefix for more than one letter (b,s,w) this is not a prefix code.
#
0/ \1
# e
0/ \1
a #
0/ \1
# g
0/ \1
m n
eggman
g
was used the most but it did not use the least amount of code to be encoded to: g
and e
switched in the binary tree would have used fewer bits.Notice that the process is not uniquely determined, because sometimes there are multiple options with the same frequencies.
a:100 b:20 c:30 d:20 e:150 f:10 g:20 h:40 i:110
a:100 b:20 c:30 e:150 #:30 g:20 h:40 i:110
/ \
d f
a:100 c:30 e:150 #:30 #:40 h:40 i:110
/ \ / \
d f b g
a:100 e:150 #:60 #:40 h:40 i:110
/ \ / \
c # b g
/ \
d f
a:100 e:150 #:60 #:80 i:110
/ \ / \
c # h #
/ \ / \
d f b g
a:100 e:150 #:140 i:110
/ \
# #
/ \ / \
c # h #
/ \ / \
d f b g
e:150 #:210 #:140
/ \ / \
a i # #
/ \ / \
c # h #
/ \ / \
d f b g
#:210 #:290
/ \ / \
a i e #
/ \
# #
/ \ / \
c # h #
/ \ / \
d f b g
#:500
/ \
# #
/ \ / \
a i e #
/ \
# #
/ \ / \
c # h #
/ \ / \
d f b g
You can think of internal nodes as being “combo” characters.
aifbcdghe
/0 1\
ai fbcdghe
/0 1\ /0 1\
a i fbcdgh e
/0 1\
fbc dgh
/0 1\ /0 1\
fb c dg h
/0 1\ /0 1\
f b d g
24
/ \
21 17
/ \ /
9 18 16
ExtractMax
, ExtractMin
, Insert
, are all \(O(\log n)\).
Peek
is \(O(1)\), because you can just read the root value and not change anything.\[ \sum_{h=0}^{\lfloor \log_2 n \rfloor} \left\lceil \frac{n}{2^{h+1}}\right\rceil O(h) = O\left( n \sum_{h=0}^{\lfloor \log_2 n \rfloor}\frac{h}{2^{h+1}} \right)= O\left( n \sum_{h=0}^{\infty}\frac{h}{2^{h+1}} \right) = O(n) \]
Given a list of characters C[1..n]
and their frequencies F[1..n]
in the message we want to encode.
Upshot: The first greedy choice is optimal.
Either way, a full binary tree with \(n\) leaves has \(2n-1\) nodes. (picture)
// indices 1 .. n are the leaves (corresponding to C[1..n])
// index 2n-1 is the root
// P[i] = parent of node i
// L[i] = left child of node i, R[i] = right child
BuildHuffman(F[1..n]):
for i <- 1 to n
L[i] <- 0; R[i] <- 0
Insert(i, F[i]) // put indices in priority queue, frequency = priority
for i <- n + 1 to 2n − 1 // ??typo in book?? (Book starts this loop at n, not n+1.)
x <- ExtractMin() // extract smallest trees in forest
y <- ExtractMin() // from the priority queue
F[i] <- F[x] + F[y] // new internal node
Insert(i, F[i]) // put new node into priority queue
L[i] <- x; P[x] <- i // update children
R[i] <- y; P[y] <- i // and parents
P[2n − 1] <- 0 // root has no parent
Table #1 | Table #2 | Table #3 | Table #4 | Table #5 | Table #6 | Table #7 |
---|---|---|---|---|---|---|
James | Drake | Kevin | Kristen | Logan | Nathan | Talia |
Andrew | Isaac | Graham | Trevor | Jordan | Claire | Grace |
Blake | Ethan | Levi | Josiah | Jack | Bri | John |
Find the running time of this implementation of Huffman’s Algorithm.
Construct the arrays F, P, L and R for the following table of frequencies. In other words, F[1..6] = [50, 80, 30, 30, 20, 20]
.
character | A | E | I | O | U | Y |
frequency | 50 | 80 | 30 | 30 | 20 | 20 |