HUFFMAN CODING

      








Huffman coding is a popular algorithm used for lossless data compression. It was developed by David A. Huffman in 1952. The main idea behind Huffman coding is to assign variable-length codes to input symbols, such as characters in a text document, based on their frequencies. Symbols that occur more frequently are assigned shorter codes, while symbols that occur less frequently are assigned longer codes. This results in a more efficient representation of the data, where frequently occurring symbols are represented by fewer bits, reducing the overall size of the encoded data.


**Forms of Huffman Coding:**


1. **Static Huffman Coding:** In static Huffman coding, the frequency distribution of symbols is known in advance, and the Huffman tree is constructed based on this fixed frequency distribution. It requires two passes over the data: one to calculate symbol frequencies and construct the Huffman tree, and another to perform the actual encoding. The same Huffman tree is used for both encoding and decoding.


2. **Dynamic Huffman Coding:** In dynamic Huffman coding (also known as adaptive Huffman coding), the frequency distribution of symbols is updated dynamically as the data is being encoded. This means that the Huffman tree is continuously adjusted and updated during the encoding process. This allows for the encoding of data without needing to know the frequency distribution in advance, making it suitable for streaming data or situations where the frequency distribution may change over time.


**Huffman Coding Example:**


Let's consider a simple example to demonstrate how Huffman coding works. Suppose we have the following string with its respective character frequencies:


```

Input string: "ABBCCCDDDDEEEEE"

Character frequencies: 

A: 1

B: 2

C: 3

D: 4

E: 5

```


**Step 1: Building the Huffman Tree**


1. Create a leaf node for each character and its frequency.

2. Create a priority queue (min-heap) and insert all the leaf nodes into it.

3. Repeat the following steps until there is only one node left in the priority queue:

   a. Extract the two nodes with the minimum frequency from the priority queue.

   b. Create a new internal node with a frequency equal to the sum of the two extracted nodes' frequencies.

   c. Make the first extracted node as the left child and the second extracted node as the right child of this new node.

   d. Insert this new node back into the priority queue.


The resulting Huffman tree for the given example will look like this:


```

        (15)

       /    \

    (6)      E(5)

   /   \

 A(1)   (5)

       /   \

     B(2)  (3)

          /   \

        C(3)  D(4)

```


**Step 2: Assign Huffman Codes**


Now, traverse the Huffman tree from the root to each leaf node and assign a binary code to each character. Going left represents a binary 0, and going right represents a binary 1.


```

A: 00

B: 01

C: 10

D: 110

E: 111

```


**Step 3: Encoding the Input String**


Using the assigned Huffman codes, we can encode the input string "ABBCCCDDDDEEEEE":


```

Input string:  A  B  B  C  C  C  D  D  D  D  E  E  E  E  E

Huffman code: 00 01 01 10 10 10 110 110 110 110 111 111 111 111 111


Encoded data: 00010101101010101101101101101101101111111111111111111

```


The encoded data is much shorter than the original string, providing compression benefits, especially for longer inputs or inputs with varying symbol frequencies. During decoding, the Huffman tree is used to uniquely decode the binary code back to the original symbols.

 Here's an implementation of Huffman coding in C:


```c

#include <stdio.h>

#include <stdlib.h>

#include <string.h>


struct HuffmanNode {

    char symbol;

    int frequency;

    struct HuffmanNode* left;

    struct HuffmanNode* right;

};


struct MinHeap {

    int size;

    int capacity;

    struct HuffmanNode** array;

};


struct HuffmanNode* create_node(char symbol, int frequency) {

    struct HuffmanNode* node = (struct HuffmanNode*)malloc(sizeof(struct HuffmanNode));

    node->symbol = symbol;

    node->frequency = frequency;

    node->left = NULL;

    node->right = NULL;

    return node;

}


struct MinHeap* create_min_heap(int capacity) {

    struct MinHeap* heap = (struct MinHeap*)malloc(sizeof(struct MinHeap));

    heap->size = 0;

    heap->capacity = capacity;

    heap->array = (struct HuffmanNode**)malloc(capacity * sizeof(struct HuffmanNode*));

    return heap;

}


void swap_nodes(struct HuffmanNode** a, struct HuffmanNode** b) {

    struct HuffmanNode* temp = *a;

    *a = *b;

    *b = temp;

}


void min_heapify(struct MinHeap* heap, int index) {

    int smallest = index;

    int left_child = 2 * index + 1;

    int right_child = 2 * index + 2;


    if (left_child < heap->size && heap->array[left_child]->frequency < heap->array[smallest]->frequency)

        smallest = left_child;


    if (right_child < heap->size && heap->array[right_child]->frequency < heap->array[smallest]->frequency)

        smallest = right_child;


    if (smallest != index) {

        swap_nodes(&heap->array[index], &heap->array[smallest]);

        min_heapify(heap, smallest);

    }

}


int is_heap_size_one(struct MinHeap* heap) {

    return (heap->size == 1);

}


struct HuffmanNode* extract_min(struct MinHeap* heap) {

    struct HuffmanNode* min_node = heap->array[0];

    heap->array[0] = heap->array[heap->size - 1];

    heap->size--;

    min_heapify(heap, 0);

    return min_node;

}


void insert_min_heap(struct MinHeap* heap, struct HuffmanNode* node) {

    heap->size++;

    int i = heap->size - 1;

    while (i > 0 && node->frequency < heap->array[(i - 1) / 2]->frequency) {

        heap->array[i] = heap->array[(i - 1) / 2];

        i = (i - 1) / 2;

    }

    heap->array[i] = node;

}


struct MinHeap* build_min_heap(char symbols[], int frequencies[], int size) {

    struct MinHeap* heap = create_min_heap(size);

    for (int i = 0; i < size; i++) {

        heap->array[i] = create_node(symbols[i], frequencies[i]);

    }

    heap->size = size;


    for (int i = size / 2 - 1; i >= 0; i--) {

        min_heapify(heap, i);

    }

    return heap;

}


struct HuffmanNode* build_huffman_tree(char symbols[], int frequencies[], int size) {

    struct HuffmanNode *left, *right, *top;

    struct MinHeap* heap = build_min_heap(symbols, frequencies, size);


    while (!is_heap_size_one(heap)) {

        left = extract_min(heap);

        right = extract_min(heap);

        top = create_node('$', left->frequency + right->frequency);

        top->left = left;

        top->right = right;

        insert_min_heap(heap, top);

    }


    struct HuffmanNode* root = extract_min(heap);

    free(heap);

    return root;

}


void print_codes(struct HuffmanNode* root, int arr[], int top) {

    if (root->left) {

        arr[top] = 0;

        print_codes(root->left, arr, top + 1);

    }


    if (root->right) {

        arr[top] = 1;

        print_codes(root->right, arr, top + 1);

    }


    if (root->left == NULL && root->right == NULL) {

        printf("%c: ", root->symbol);

        for (int i = 0; i < top; i++) {

            printf("%d", arr[i]);

        }

        printf("\n");

    }

}


void huffman_codes(char symbols[], int frequencies[], int size) {

    struct HuffmanNode* root = build_huffman_tree(symbols, frequencies, size);

    int arr[100], top = 0;

    print_codes(root, arr, top);

}


int main() {

    char symbols[] = {'A', 'B', 'C', 'D', 'E'};

    int frequencies[] = {1, 2, 3, 4, 5};

    int size = sizeof(symbols) / sizeof(symbols[0]);


    huffman_codes(symbols, frequencies, size);


    return 0;

}

```


output:

C: 00

A: 010

B: 011

D: 10

E: 11


Process returned 0 (0x0)   execution time : 0.050 s

Press any key to continue.




This C implementation builds the Huffman tree based on the given symbol frequencies and then prints the Huffman codes for each symbol. Note that this example uses a fixed set of symbols and their respective frequencies. In a real use-case, you would read the input data to determine the frequencies dynamically.

Post a Comment

0 Comments