Compression & Information Theory Lab
Encode text with Run-Length Encoding and Huffman coding, then compare their compression ratios. Build Huffman trees and visualize variable-length codes. Calculate Shannon entropy and discover it as the theoretical lower bound for lossless compression.
Guided Experiment: Entropy as Compression Lower Bound
Can any lossless compression algorithm compress a message below its Shannon entropy (in bits per symbol)? How does Huffman average bits compare to entropy?
Write your hypothesis in the Lab Report panel, then click Next.
Controls
RLE Analysis
| Character | Count | Encoded |
|---|---|---|
| 'A' | 3 | 3A |
| 'B' | 3 | 3B |
| 'C' | 4 | 4C |
| 'D' | 2 | 2D |
| 'E' | 2 | 2E |
Data Table
(0 rows)| # | Input | Algorithm | Original Bits | Compressed Bits | Ratio | Entropy (bits) |
|---|
Reference Guide
Run-Length Encoding
RLE replaces consecutive identical characters with a count and the character. It works best on data with long runs.
For text without runs (like English prose), RLE can actually increase the size because each character needs a count prefix.
Huffman Coding
Huffman coding assigns shorter bit codes to more frequent characters, creating an optimal prefix-free code.
The Huffman tree is built bottom-up by repeatedly combining the two lowest-frequency nodes.
Shannon Entropy
Shannon entropy measures the average information content per symbol. It is the theoretical minimum bits needed per symbol.
Maximum entropy occurs when all symbols are equally likely. Low entropy means the data is predictable and compresses well.
Compression Ratio
The compression ratio measures how much smaller the compressed data is compared to the original.
Huffman coding approaches the entropy limit. No lossless compression can consistently beat entropy.