In this lab, students will implement Huffman coding in C++ to compress and decompress text files. They will learn how to construct a Huffman tree, generate Huffman codes, encode a text file into a compressed binary format, and then decode it back to its original form. Additionally, students will analyze the efficiency of Huffman compression by comparing the file sizes before and after compression. By the end of this lab, students will be able to:
string filename;
cout << "Enter text file name: ";
cin >> filename;
// Read input file
ifstream inputFile(filename);
if (!inputFile) {
cout << "Error opening file.\n";
return 1;
}
string text;
char ch;
//read characters from a file one by one until the end of the file is reached
while (inputFile.get(ch)) {
_____________ // Append each character to the string
}
inputFile.close();
unordered_map<char, int> freqMap;
for (char ch : text) {
_______________ // Increment the frequency of each character in freqMap
}
// Comparison Function to sort nodes based on frequency
bool compare(Node* a, Node* b) {
return a->freq < b->freq; // Sort in ascending order of frequency
}
// Build Huffman Tree with vector
Node* buildHuffmanTree(unordered_map<char, int>& freqMap) {
vector<Node*> nodes; // Store nodes in a vector
// Step 1: Convert frequency map into nodes
for (auto pair: freqMap) {
char character = pair.first; // Get the character (e.g., 'a')
int frequency = pair.second; // Get the frequency (e.g., 15)
//Create a node using newNode function with character and its frequency
______________________________________
// Use push_back to add the newly created node to the nodes vector.
_______________________________________
}
// Step 2: Sort nodes by frequency
// This loop continues merging nodes until only one node remains, which becomes the root of the Huffman Tree.
while (nodes.size() > 1) {
sort(nodes.begin(), nodes.end(), compare); //sort the nodes in ascending order of frequency
// Take two smallest frequency nodes
Node* left = _________; //Hint: The node at index 0 has the smallest frequency.
Node* right = _________; //Hint: The node at index 1 has the second smallest frequency.
// Create a new node combining both
// '\0' is the most common placeholder for internal nodes.
Node* merged = newNode('\0', ___________________); //Hint: The frequency of the merged node is the sum of left and right nodes' frequencies.
merged->left = left;
merged->right = right;
// Remove the first two nodes from the list
nodes.erase(nodes.begin(), nodes.begin() + 2);
___________________//Hint: push_back() adds the merged node to the end of the nodes vector.
}
// Ensure the vector is not empty before accessing index 0
if (nodes.empty()) return nullptr;
return nodes[0]; // The last remaining node is the root
}
void generateHuffmanCodes(Node* root, string code, unordered_map<char, string>& huffmanCode) {
if (root == nullptr) return; // Base case: Stop recursion if tree is empty
//// If the node is a leaf node (contains a character)
if (root->left == nullptr && root->right == nullptr) {
huffmanCode[root->ch] = code; // Store Huffman code for this character
return;
}
//If the node has a left child, assigns 0 for a left branch
generateHuffmanCodes(_________, code + "0", huffmanCode);
//If the node has a right child, assigns 1 for a right branch
generateHuffmanCodes(_________, code + "1", huffmanCode);
}
// This should be placed in the main function after building the Huffman tree.
unordered_map<char, string> huffmanCode;
generateHuffmanCodes(root, "", huffmanCode);
// Encode the text using Huffman Codes
string encodeText(string text, unordered_map<char, string>& huffmanCode) {
string encodedText;
for (char ch : text) {
_______________________ //Append the Huffman code of each character to the encoded text.
}
return encodedText;
}
ofstream encodedFile;
encodedFile.open("compressed.bin");
if (!encodedFile) {
cerr << "Error: Unable to open compressed.txt for writing!" << endl;
return 1;
}
encodedFile << ________________; // Write encoded text
encodedFile.close();
// Decode Huffman encoded text
string decodeText(string encodedText, Node* root) {
string decodedText;
Node* current = root; //Start at the root of the Huffman tree
//Loop through each bit in encodedText
for (char bit : encodedText) {
// If the bit is '0', move to the left child; if the bit is '1', move to the right child.
current = (bit == '0') ? ______________ : ______________; //Hint: Use the left child when the bit is '0' and the right child when the bit is '1'.
// Check if we reach a leaf node (character node)
if (________________________________________________) {
decodedText += current->ch; // Append the decoded character to the result
current = root; // Reset to the root for the next character
}
}
return decodedText; // Return the fully decoded string
}
The original text file size is obtained by counting the number of characters, which gives the total number of bytes,
as each character typically occupies one byte. Original text file size: text.size() bytes
The compressed file size is estimated by taking the total number of bits in the encoded text (since each character in 'encodedText' represents one bit)
and divide by 8 because there are 8 bits in one byte. This calculation is approximate since the total bit count may not be a perfect multiple of 8.
Compressed file size: encodedText.size() / 8 bytes (approximately)
What is the size of decompressed.txt?
Enter text file name: input.txt
Huffman Codes:
p : 1111101
f : 111111
I : 11111001
v : 11111000
g : 111101
m : 111100
i : 11101
c : 11100
: 110
l : 10111
q : 1011011
T : 10110100
u : 101100
s : 000
d : 0010
n : 0011
a : 1000
H : 10110101
e : 010
r : 0111
t : 0110
b : 10010110
o : 1010
z : 1001010
y : 10010111
. : 100100
h : 10011
Encoded Text (first 100 bits): 1011010110110011111111111111110010000011110111001010001011101001111110111011101000110100011010111101...
Original file size: 212 bytes
Compressed file size: 110 bytes (approx)
Compressed output saved to 'compressed.bin'
Decompressed file size: 212 bytes
Decompressed output saved to 'decompressed.txt'
Deleting nodes: s d n e t r a . z b y h o u T H q l c i m g v I p f
| Category | Criteria | Points |
|---|---|---|
| Huffman Tree Node (5 points) |
The Huffman Tree Node is defined correctly. | 5 |
| New Node Function (5 points) |
The function to create a new node is defined correctly. | 5 |
| Huffman tree (17 points) |
The Huffman tree is constructed correctly. | 17 |
| Huffman encoding (15 points) |
Huffman encoding has been successfully implemented | 15 |
| Huffman decoding (15 points) |
Huffman decoding has been successfully implemented | 15 |
| File I/O operations (15 points) |
File I/O operations work correctly | 15 |
| Main Function Implementation (10 points) |
All functions are correctly invoked in the main function | 10 |
| Analyze compression efficiency (5 points) |
Accurately compare the sizes of input.txt, compressed.bin, and decompressed.txt | 5 |
| Code Readability & Comments (5 points) |
Maintain consistent indentation and spacing throughout the code. | 2 |
| Include comments that explain the purpose of key sections and logic. | 3 | |
| Test Cases & Output Accuracy (5 points) |
The output for different test cases is correct and consistent. | 5 |
| AI disclaimer (3 points) |
Please put an AI disclaimer at the top of your source code as comments. | 3 |
| Errors |
Your program has syntax errors, compiling errors, running errors, or infinite loops. | -50 points |
/*
Author: Your Name
Date: Month/Day/Year
Lab Purpose:
A.I. Disclaimer: please put your A.I. disclaimer here
*/
Please submit your Lab 4 source code file: lab4.cpp in D2L!