summaryrefslogtreecommitdiff
path: root/Userland/Libraries/LibCompress/Deflate.h
diff options
context:
space:
mode:
authorTimothy Flynn <trflynn89@pm.me>2023-03-28 14:45:20 -0400
committerAndreas Kling <kling@serenityos.org>2023-03-29 07:19:14 +0200
commit5aaefe4e62a7793c100c85a8b5662eded03132fb (patch)
treebcf1be5ebcddc5f5c54a922a8168e2b0c088e764 /Userland/Libraries/LibCompress/Deflate.h
parent8e834d4bb2f8a217013142658fe7203c5a5c3170 (diff)
downloadserenity-5aaefe4e62a7793c100c85a8b5662eded03132fb.zip
LibCompress: Use prefix tables to decode Huffman codes up to 8 bits long
Huffman codes have a useful property in that they are prefix codes. That is, a set of bits representing a Huffman-coded symbol is never a prefix of another symbol. This allows us to create a table, where each index in the table are integers whose prefix is the entry's corresponding Huffman code. With Deflate, we can have codes up to 16 bits in length, thus creating a prefix table with 2^16 entries. So instead of creating a table fit all possible codes, we use a cutoff of 8-bit codes. Codes larger than 8 bits fall back to the binary search method. Using the "enwik8" file as a test (100MB uncompressed, commonly used in benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression time decreases from 3.527s to 2.585s on Linux.
Diffstat (limited to 'Userland/Libraries/LibCompress/Deflate.h')
-rw-r--r--Userland/Libraries/LibCompress/Deflate.h10
1 files changed, 10 insertions, 0 deletions
diff --git a/Userland/Libraries/LibCompress/Deflate.h b/Userland/Libraries/LibCompress/Deflate.h
index feb6e088ec..4767032061 100644
--- a/Userland/Libraries/LibCompress/Deflate.h
+++ b/Userland/Libraries/LibCompress/Deflate.h
@@ -30,10 +30,20 @@ public:
static Optional<CanonicalCode> from_bytes(ReadonlyBytes);
private:
+ static constexpr size_t max_allowed_prefixed_code_length = 8;
+
+ struct PrefixTableEntry {
+ u16 symbol_value { 0 };
+ u16 code_length { 0 };
+ };
+
// Decompression - indexed by code
Vector<u16> m_symbol_codes;
Vector<u16> m_symbol_values;
+ Array<PrefixTableEntry, 1 << max_allowed_prefixed_code_length> m_prefix_table {};
+ size_t m_max_prefixed_code_length { 0 };
+
// Compression - indexed by symbol
Array<u16, 288> m_bit_codes {}; // deflate uses a maximum of 288 symbols (maximum of 32 for distances)
Array<u16, 288> m_bit_code_lengths {};