Hamming Code Simulator

30 October 2018 / Side Project

Hamming code was developed in 1950 by Richard W. Hamming, an American mathematician and computer scientist. Richard was irritated by the inability of punch card readers to correct errors, so he spent several years developing error-correction algorithms. The result was a family of algorithms called Hamming code. To this day, Hamming code is still in use in DRAM, RAID storage, and other mediums that require simple error correction.

In this article, I'll explain how Hamming code works, describe how to implement it programmatically, and simulate the transmission of an image over a noisy channel. We'll be able to see the effectiveness, and limitations, of Hamming(12, 8), where 8 bits of input data is encoded into a 12-bit output before being transmitted. Noise will be simulated by flipping bits using a uniform random distribution.

Understanding Hamming Code

Hamming code is interested in encoding a binary string (i.e. 1010) in such a way that it can detect and correct errors introduced later. It accomplishes this by padding a binary string with new bits called parity bits.

A parity bit checks whether the number of 1-bits in a subset of a binary string is even or odd. We set a parity bit to 1 if the subset is odd, or we set it to 0 if the subset is even. In other words, you are performing a bitwise XOR operation on a subset of digits.

For example, suppose we have a binary string 101. Let's say we want to add a parity bit that checks positions 1 and 3 in our string. The first and last bit in our string are 1, therefore our parity bit is equal to $1 \oplus 1 = 0$.

In Hamming code, parity bits follow a pattern for positions checked. The position of the parity determines the sequence of bits that it checks and skips by $2^r$, starting at position $2^r$, where $r$ represents the parity sequence, starting at $r = 0$. For example, the first parity bit ($r = 0$) at position $2^0$ will (a) check $2^0$ bit, (b) skip $2^0$ bit, and then repeat (a) and (b) until it iterates through the entire binary string. The second parity bit ($r = 1$) at position $2^1$ will perform (a) and (b), but checking and skipping $2^1$ bits instead of 1 bit, and so on.

Here is the rule in tabular format:

Check Number Check Positions Positions Checked
1 1 1, 3, 5, 7, 9, 11, 13, 15, 17, ...
2 2 2, 3, 6, 7, 10, 11, 14, 15, 18, ...
3 4 4, 5, 6, 7, 12, 13, 14, 15, 20, ...
4 8 8, 9, 10, 11, 12, 13, 14, 15, 24, ...
. . .
. . .

Example Problem

Let's encode an 8-bit binary string step-by-step using Hamming(12, 8). First, we add 4 parity bits in a 8-bit string to create a 12-bit code word. Suppose we have an 8-bit string:

0 1 1 0 1 0 1 1

The first parity bit ($r = 0$) will be in position 1 ($2^0 = 1$). The second parity bit will be in position 2, the third in position 4, and fourth in position 8. I will denote the position of parity bits as underscores:

_ _ 0 _ 1 1 0 1 _ 0 1 1

Notice that the position follows our rule of $2^r$, where $0 \leq r \leq n - 1$ and $n$ is the number of parity bits we're adding (in our case, 4 bits).

Recall that for parity bit, starting from its position, will check 1 bit and skip 1 bit. Therefore, we check positions 1, 3, 5, 7, 9, 11. Example:

_ _ 0 _ 1 1 0 1 _ 0 1 1
^   ^   ^   ^   ^   ^

We see that the number of 1-bits is 2 (even). Therefore, we set the first parity bit to 0:

0 _ 0 _ 1 1 0 1 _ 0 1 1
^-- Parity bit

We still have three more parity bits to calculate. You can use the table to help you find the check positions.

0 _ 0 _ 1 1 0 1 _ 0 1 1   Second parity (position 2) - odd
  ^ ^     ^ ^     ^ ^ 
0 1 0 _ 1 1 0 1 _ 0 1 1

0 1 0 _ 1 1 0 1 _ 0 1 1   Third parity bit (position 4) - odd
      ^ ^ ^ ^         ^
0 1 0 1 1 1 0 1 _ 0 1 1

0 1 0 1 1 1 0 1 _ 0 1 1   Fourth parity bit (position 8) - odd
              ^ ^ ^ ^ ^
0 1 0 1 1 1 0 1 1 0 1 1

The complete codeword is 1 1 0 1 1 1 0 1 1 0 1 1. We can verify that there are no errors by checking each parity check. Let's introduce an error on position 5 and see what the parity checks tell us.

0 1 0 1 0 1 0 1 1 0 1 1
        ^-- bit flipped

The first parity check evaluates positions 1, 3, 5, 7, 9 11. Since it predicts a 1 and instead the parity bit is 0, we note 1.

0 1 0 1 0 1 0 1 1 0 1 1
^   ^   ^   ^   ^   ^

Parity checks: 1

An error has been detected. We will now evaluate the second, third, and fourth parity checks to detect the position of the error:

1 1 0 1 0 1 0 1 1 0 1 1     Second parity check
  ^ ^     ^ ^     ^ ^

Since the second parity bit predicts correct, we add a 0 to the left:

Parity checks: 0 1

Checking the third parity bit:

1 1 0 1 0 1 0 1 1 0 1 1     Third parity check
      ^ ^ ^ ^         ^

The third parity bit predicts incorrectly, so we add a 1 to the left of 0 1:

Parity checks: 1 0 1

Finally, we see that the fourth parity bit is correct, so we ad a 0 to the left of 1 0 1:

1 1 0 1 0 1 0 1 1 0 1 1     Fourth parity check
              ^ ^ ^ ^ ^

Parity checks: 0 1 0 1

It's not a coincidence that 0 1 0 1 is the binary representation of 5, the position where we introduced the error. We then flip the bit at position 5 to correct the error and obtain our original binary string.

Diving Deeper: Coding Theory

Hamming code is actually one of the first linear codes created. Linear codes are applications of information theory, which provides the mathematical underpinnings data transmissions. We'll use linear algebra in our implementation to greatly simplify the algorithm. We create two matrices: a parity check matrix and a generator matrix. We use the generator matrix to encode input, and the parity check matrix to decode output. Understanding how these two matrices are created is not necessary to implement the code, but it gives you greater awareness of the mathematics behind Hamming code.

The implementation of Hamming code can be greatly simplified by using a bit of linear algebra and coding theory to reduce the algorithm to a series of linear algebraic operations.

Creating the Parity Check Matrix

The parity check matrix is essentially the matrix representation of the parity check equations. For Hamming(12, 8), we have 4 parity checks. Therefore, we have 4 parity equations.

For a binary string of length 8, we use 4 parity bits to create a 12-bit codeword $c$.

I will use $a$ to denote a data bit, and $p$ to denote a parity bit. Notice that the minimum Hamming distance ($d = 3$) is satisfied because every data bit is present in at least 2 of the 3 parity equations. A minimum distance of 3 allows you to correct up to 1 error.

The parity check equations are:

$$ p_1 = a_1 + a_3 + a_5 + a_7 + a_9 + a_{11} \\
p_2 = a_2 + a_3 + a_6 + a_7 + a_{10} + a_{11} \\
p_3 = a_4 + a_5 + a_6 + a_7 + a_{12} \\
p_4 = a_8 + a_9 + a_{10} + a_{11} + a_{12} $$

Looking at the coefficients of these equations, we can deduce the parity check matrix $\mathbf{H}$:

$$\mathbf{H} = \begin{bmatrix}
1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\
0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 \\
0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 \\
\end{bmatrix}
$$

Notice how each column in the parity check matrix is actually a sequence of binary numbers from $1$ to $12^3$.

Now that we have the parity matrix $\mathbf{H}$, we can focus on the generator matrix. The generator matrix $\mathbf{G}$ has the dimensions $12 \times 8$. This is because we are interested in decoding a codeword of length 12, and each column of our generator matrix represents an 8-bit basis vector in $(F_2)^8$.

We rely on the binary field $F_2$, which is a finite field of size 2, also denoted as $GF(q)$, where $q = 2$. Binary finite fields are given by the integers mod p where p is 2 (a prime number). In our case, we're interested in $(F_2)^k$, where $k = 8$. A standard basis in this field is 8 digits long (i.e. $(0,0,0,0,0,0,0,1)$).

Creating the Generator Matrix

Recall that the parity check equations must add to 0. This essentially means that when we perform bitwise XOR addition, they must add to 0.

Let's calculate $u_1$. We use the first basis vector $e_1$ of $(F_2)^8$. $e_i$ is basically a 8-bit string with a 1 in position i and 0 in every other position:

\[ e_1 = 10000000 \]

This string is encoded to find the first standard basis $u_1$. There are eight standard bases, but I'll focus on calculating the first and second.

We can decode it either by (1) following the same Hamming code rule for checking data bits we discussed earlier, (2) performing a binary expansion of each number, or (3) observing the Tanner graph.

Tanner Graph for Hamming(12,8)
Let's begin by adding four parity bits in positions 1, 2, 4, and 8 in our 8-bit string, and focusing on parity bit 1:

__1_0000_000

a3 = 1
a5 = 0
a7 = 0
a9 = 0

Starting with parity bit 1, we see that, if $a_3 = 0$, $a_5 = 0$, $a_7 = 0$, $a9 = 0$ then $a_1$ must equal 1 in order for $a_1 \oplus a_3 \oplus a_5 \oplus a_7 = 0$ to be true. Therefore, we place 1 in position 1 of our codeword:

1_1_0000_000

Moving on to parity bit 2, using the same process, we see that if $a_3 = 1$, $a_6 = 0$, $a_7 = 0$, $a_10 = 0$, and $a_11$ then $a_2$ must equal 1 for $a_1 \oplus a_3 \oplus a_6 \oplus a_7 \oplus a_{10} \oplus a_{11} = 0$.

111_0000_000

The last two parity bits are easy to deduce in this case, since the data bits in consideration are all 0.

111000000000

Congratulations! You just calculated your first basis vector $\mathbf{u_1}$. You can calculate the remaining 7 basis vectors using the same procedure. For example, you will use $e_2 = 01000000$ for $\mathbf{u_2}$, $e_3 = 00100000$ for $\mathbf{u_3}$, ..., $e_8 = 00000001$ for $\mathbf{u_8}$.

Once we have all 8 basis vectors, we arrange each vector row by row to produce:

$$\mathbf{G} = \begin{bmatrix}
1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
1 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 1 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\
1 & 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 \\
1 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\
\end{bmatrix}$$

This is the second key ingredient of our simulator. With the generator matrix for Hamming(12, 8), we can decode codewords and detect up to one error!

We use syndrome decoding with the generator matrix and the input codeword. Syndromes are essentially the linear combination of the generator matrix and the codeword. Just as there are 4 parity bits in Hamming(12, 8), there are also 4 syndromes. In effect, syndromes behave just like a parity bit, telling you whether there's an error and if so, its location.

Simulation: From Theory to Practice

We can now apply our mathematical insights to finally begin implementing our encoding/decoding functions and simulator.

Our simulator is divided into three main parts: pseudorandom noise, encoder function, and decoder function. I will break down each part and explain how it works. You can find the full source code ready for use in Github.

Encoder Function

The encoding function is as follows:

void EncodeHamming(bool *message, bool(&codeword)[12]) {
    for (int i = 0; i < 12; i++) {
        int result = 0;
        for (int j = 0; j < 8; j++) {
            result ^= gMatrix[j][i] * message[j];
        }
        codeword[i] = result;
    }
}

A bool pointer array message is an input byte that is encoded and ouputted to codeword.

Pseurandom Noise

To simulate noise, we use a uniform random distribution on every 12-bit code. I use mt19937 and the time during the program's execution as seed, and a low probability of 0.01 to keep within Hamming code's tolerances.

// Bit error probability
const double PROBABILITY = 0.01;

// Mersenne Twister 19937 Generator
std::random_device rd;
std::mt19937 mt(time(0));
std::uniform_real_distribution<double> dist(0, 1);

Introducing errors to the binary string is quite simple. We loop through each bit, call dist(mt) to obtain a random real number between 0 and 1 in a uniform random distribution, and check if rand_t is less than or equal to our probability constant. If that condition is met, we flip the bit using a XOR operator.

// Simulate 1% bit error
for (int j = 0; j < 12; j++) {
    double rand_t = dist(mt);
    if (rand_t <= PROBABILITY)
        b12_BinaryString[j] = b12_BinaryString[j] ^ 1;
}

Decoder Function

To decode the codeword, we first must find the syndromes of the codeword.

int * FindSyndrome(bool *codeword) {
	int *syndrome = new int[4];
	int result;

	for (int i = 0; i < 4; i++) {
		result = 0;
		for (int g = 0; g < 12; g++) {
			result ^= pMatrix[i][g] * codeword[g];
		}
		syndrome[i] = result;
	}
	return syndrome;
}

The syndromes behave just like parity bits. If XOR sum of the syndromes is 0, then no errors were detected. If it's non-zero, then an attempt to correct the error is performed. The bit where the error is predicted is flipped, and that's it. There's obviously no guarantee that the error is corrected, and in some instances it may add more errors, but for a low noise rate, it corrects errors with reasonable accuracy.

void DecodeHamming(int *syndrome, bool *codeword, bool *decodedMessage) {
	if (syndrome[0] + syndrome[1] + syndrome[2] + syndrome[3] == 0) {
		decodedMessage[0] = codeword[2];
		decodedMessage[1] = codeword[4];
		decodedMessage[2] = codeword[5];
		decodedMessage[3] = codeword[6];
		decodedMessage[4] = codeword[8];
		decodedMessage[5] = codeword[9];
		decodedMessage[6] = codeword[10];
		decodedMessage[7] = codeword[11];
	}
	// Case 2: Error detected, attempt single error correction
	else {
		// Find error position
		int pos = 0;
		int j = 1;
		for (int i = 0; i < 4; i++) {
			if (syndrome[i])
				pos += j;
			j = j << 1;
		}
		// Apply error correction
		if (pos < 12)
			codeword[pos - 1] = codeword[pos - 1] ^ 1;

		decodedMessage[0] = codeword[2];
		decodedMessage[1] = codeword[4];
		decodedMessage[2] = codeword[5];
		decodedMessage[3] = codeword[6];
		decodedMessage[4] = codeword[8];
		decodedMessage[5] = codeword[9];
		decodedMessage[6] = codeword[10];
		decodedMessage[7] = codeword[11];
	}
}

Visualizing the Simulation

An image in PGM format is used in our simulator as the data being transmitted. PGM encodes each pixel in one byte, giving the format more resilience against corruption than other, more complex formats such as JPG or PNG.

Let me illustrate the effectiveness of Hamming code with a photo of young Paul Allen and Bill Gates. This image has been passed through the pseudorandom noise simulation with a 1% error rate:

Simulated 1% Noise - Without Hamming Code

Now, let's see what happens when we encode the image before simulating noise:

Simulated 1% Noise - With Hamming Code

As you can see, there's a significant reduction in image noise as a result of Hamming encoding!

And that concludes our simulation of transmitting Hamming encoded data over a noisy medium.

Manuel Figueroa

Full Stack Developer

Thank you for reading my blog. Please feel free to contact me or read my other content.

Manuel Figueroa