HUFFMAN.COMPRESSOR

GOAL: compressing and decompressing a file
SYNPOSIS: Individual project | Spring 2018 | Java
KEY SKILLS: Trees, Maps, Priority Queues | File Input & Output | Implementing Huffman’s Alg*
VALIDATION: War & Peace novel (587,000 words) 3.2MB -> 1.8MB in 140 nanoseconds
TEST ME!

Key Process:

  1. Read the text file and generated a frequency hash map of the file’s characters
  2. Generated a prefix-free code for each character by first, creating a tree for each character. Then, combined the trees to create a binary tree where each character is a leaf, and each edge is either a 0 or 1. Used a priority queue when combining trees to ensure the highest freqency characters are closest to the root, thereby having the shortest codes. Used recursion to traverse the binary tree just once, then stored the code for each character in a map.
  3. To compress, used the map to find the code for each character in the text file. Created a new file, writing corressponding bits on it. Was given a pre-written code to convert binary to bits.
  4. To decompress, first changed from bits to binary. Then traversed the binary tree, left for 0, right for 1. Once reached a leaf, wrote the corresponding character to a new file and restarted traversing the tree from the root until there were no more binary numbers to read.

*Huffman System: a variable-length character encoding system where frequently occurring characters in a file are assigned shorter code words than infrequently occurring ones. (A code word is a binary sequence.) This is different from ASCII that indiscriminately uses 7 bits to all characters in a file.