All Tools

Data Compression & Image Encoding Explorer

Type text to see Run-Length Encoding and Huffman coding in action, or paint pixels on a grid to explore how images are stored and compressed. Compare lossless and lossy methods side by side.

Controls

14 characters

Run-Length Encoding (RLE)

RLE scans the text from left to right and replaces consecutive identical characters with a count followed by the character.

Step-by-step encoding

AAA3A
BBB3B
CC2C
DDDD4D
EE2E

Encoded output

3A3B2C4D2E

Original vs Encoded

Original (14 chars, 112 bits)

AAABBBCCDDDDEE

Encoded (10 chars, 80 bits)

3A3B2C4D2E

Original Size

112 bits

14 bytes

Compressed Size

80 bits

10 bytes

Compression Ratio

1.40x

smaller

Space Savings

28.6%

saved

Compression Comparison

Uncompressed (ASCII)14 B
Run-Length Encoding10 B
Huffman Coding4 B
MethodOriginalCompressedRatioSavingsType
Uncompressed (ASCII)14 B14 B1.00x0.0%Lossless
Run-Length Encoding14 B10 B1.40x28.6%Lossless
Huffman Coding14 B4 B3.50x71.4%Lossless
Lossless (no data loss, perfectly reversible)
Lossy (some data permanently discarded)

Reference Guide

Run-Length Encoding

RLE is one of the simplest compression algorithms. It replaces consecutive identical symbols with a count and the symbol. For example, "AAABBBCC" becomes "3A3B2C".

RLE works best on data with long runs of repeated values, like simple graphics, fax transmissions, and binary data. It performs poorly on text with few repeats because the count digit adds overhead.

RLE is lossless, meaning the original data can be perfectly reconstructed from the compressed output.

Huffman Coding

Huffman coding assigns variable-length binary codes based on character frequency. Characters that appear more often get shorter codes. The algorithm builds a binary tree from the bottom up by repeatedly merging the two lowest-frequency nodes.

The resulting code is prefix-free, meaning no code is the beginning of another code. This makes decoding unambiguous. ASCII uses 8 bits per character regardless of frequency, while Huffman codes adapt to the input data.

Huffman coding is used inside ZIP, GZIP, JPEG, and MP3 formats.

Lossless vs Lossy

Lossless compression preserves every bit of the original data. You can decompress and get back exactly what you started with. RLE, Huffman coding, ZIP, and PNG all use lossless compression.

Lossy compression discards some data to achieve smaller file sizes. The original cannot be perfectly recovered. JPEG, MP3, and video codecs like H.264 use lossy compression.

The tradeoff is file size vs quality. Lossy compression achieves much higher compression ratios but introduces artifacts that become more noticeable at higher compression levels.

Image Color Depth

Color depth determines how many bits represent each pixel. True color uses 24 bits (8 per channel) for 16.7 million possible colors. Reducing to 8 bits gives only 256 colors, 4 bits gives 16 colors, and 1 bit gives black and white.

Indexed color takes a different approach by building a palette of unique colors used in the image, then storing each pixel as an index into that palette. If an image uses only 4 unique colors, each pixel needs just 2 bits instead of 24.

PNG uses indexed color when possible. GIF is always indexed with a maximum palette of 256 colors.