diff options
author | Timothy Flynn <trflynn89@pm.me> | 2023-03-28 14:45:20 -0400 |
---|---|---|
committer | Andreas Kling <kling@serenityos.org> | 2023-03-29 07:19:14 +0200 |
commit | 5aaefe4e62a7793c100c85a8b5662eded03132fb (patch) | |
tree | bcf1be5ebcddc5f5c54a922a8168e2b0c088e764 /Userland/Libraries/LibCompress/Deflate.h | |
parent | 8e834d4bb2f8a217013142658fe7203c5a5c3170 (diff) | |
download | serenity-5aaefe4e62a7793c100c85a8b5662eded03132fb.zip |
LibCompress: Use prefix tables to decode Huffman codes up to 8 bits long
Huffman codes have a useful property in that they are prefix codes. That
is, a set of bits representing a Huffman-coded symbol is never a prefix
of another symbol. This allows us to create a table, where each index in
the table are integers whose prefix is the entry's corresponding Huffman
code.
With Deflate, we can have codes up to 16 bits in length, thus creating a
prefix table with 2^16 entries. So instead of creating a table fit all
possible codes, we use a cutoff of 8-bit codes. Codes larger than 8 bits
fall back to the binary search method.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from 3.527s to 2.585s on Linux.
Diffstat (limited to 'Userland/Libraries/LibCompress/Deflate.h')
-rw-r--r-- | Userland/Libraries/LibCompress/Deflate.h | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/Userland/Libraries/LibCompress/Deflate.h b/Userland/Libraries/LibCompress/Deflate.h index feb6e088ec..4767032061 100644 --- a/Userland/Libraries/LibCompress/Deflate.h +++ b/Userland/Libraries/LibCompress/Deflate.h @@ -30,10 +30,20 @@ public: static Optional<CanonicalCode> from_bytes(ReadonlyBytes); private: + static constexpr size_t max_allowed_prefixed_code_length = 8; + + struct PrefixTableEntry { + u16 symbol_value { 0 }; + u16 code_length { 0 }; + }; + // Decompression - indexed by code Vector<u16> m_symbol_codes; Vector<u16> m_symbol_values; + Array<PrefixTableEntry, 1 << max_allowed_prefixed_code_length> m_prefix_table {}; + size_t m_max_prefixed_code_length { 0 }; + // Compression - indexed by symbol Array<u16, 288> m_bit_codes {}; // deflate uses a maximum of 288 symbols (maximum of 32 for distances) Array<u16, 288> m_bit_code_lengths {}; |