serenity - The Serenity Operating System 🐞

Age	Commit message (Collapse)	Author
2023-05-19	LibCompress: Move two shared LZMA magic numbers into a common place	Tim Schumacher

2023-05-19	LibCompress: Handle arbitrarily long FF-chains in the LZMA encoder	Tim Schumacher

2023-05-19	LibCompress: Add debug logging for handling LZMA direct bits	Tim Schumacher

2023-05-17	LibCompress: Add a lot of debug logging to LZMA	Tim Schumacher

2023-05-17	LibCompress: Add an LZMA encoder	Tim Schumacher

2023-05-17	LibCompress: Use the variable for LZMA "normalized to real distance"	Tim Schumacher
	The variable already existed, but I forgot to use it earlier.
2023-05-17	LibCompress: Decode the LZMA match type in a separate function	Tim Schumacher
	This should keep the `read_some` function a bit flatter and shorter, and make it easier to match the match type decoding process with the specification.
2023-05-17	LibCompress: Make LzmaHeader a POD-like type	Tim Schumacher
	This allows us to initialize the struct using an aggregate initializer.
2023-05-17	LibCompress: Extract the LZMA state to a separate class	Tim Schumacher
	We will also need this in the compressor, as it needs to do the exact same calculations in reverse.
2023-05-09	AK: Add the `Input` word to input-only buffered streams	Lucas CHOLLET
	This concerns both `BufferedSeekable` and `BufferedFile`.
2023-05-04	LibCompress: Remove special casing for looping DEFLATE seekbacks	Tim Schumacher
	The `copy_from_seekback` method already handles this exactly as DEFLATE expects, but it is slightly more optimized.
2023-04-12	LibCompress: Error on truncated uncompressed DEFLATE blocks	Tim Schumacher

2023-04-12	LibCompress: Replace usages of the Endian bytes accessor	Tim Schumacher

2023-04-08	LibCompress: Mark some XZ-related variables and functions as const	Tim Schumacher

2023-04-08	LibCompress: Move loading XZ blocks into its own function	Tim Schumacher

2023-04-08	LibCompress: Move finishing the current XZ stream into its own function	Tim Schumacher

2023-04-08	LibCompress: Move finishing the current XZ block into its own function	Tim Schumacher

2023-04-08	LibCompress: Move loading XZ stream headers into its own function	Tim Schumacher

2023-04-07	LibCompress: Tolerate more than 288 entries in CanonicalCode	Nico Weber
	Webp lossless can have up to 2328 symbols. This code assumed the deflate max of 288, leading to crashes for webp lossless files using more than 288 symbols (such as Tests/LibGfx/test-inputs/simple-vp8l.webp). Nothing writes webp files at this point, so the m_bit_codes and m_bit_code_lengths arrays aren't ever used in practice with more than 288 entries.
2023-04-05	LibCompress: Copy LZMA repetitions from the buffer in sequence	Tim Schumacher
	This improves the decompression time of `clang-15.0.7.src.tar.xz` from 5.2 seconds down to about 2.7 seconds.
2023-04-05	AK+LibCompress: Break when seekback copying to a full CircularBuffer	Tim Schumacher
	Otherwise, we just end up infinitely looping while waiting for more space in the destination.
2023-04-05	LibGfx: Add some support for decoding lossless webp files	Nico Weber
	Missing: * Transform support (used by virtually all lossless webp files) * Meta prefix / entropy image support Working: * Decoding of regular image streams * Color cache This happens to be enough to be able to decode Tests/LibGfx/test-inputs/extended-lossless.webp The canonical prefix code is very similar to deflate's, enough so that this can use Compress::CanonicalCode (and take advantage of all the recent performance improvements there).
2023-04-04	LibCompress: Order branches in Deflate's decode_codes() numerically	Nico Weber
	deflate_special_code_length_copy has value 16, so it should be before the two zero-filling branches for codes 17 and 18. Also, the initial if also refers to deflate_special_code_length_copy as well, so if it's repeated right in the next else, one has to keep it on the mental stack for shorter when reading this code. No behavior change.
2023-04-04	LibCompress: Remove a few no-op continue statements in Deflate	Nico Weber
	Alternatively, we could remove the else after the continue, but all branches here should be equally prominent, so this seems a bit nicer. No behavior change.
2023-04-02	AK: Increase LittleEndianOutputBitStream's buffer size and remove loops	Timothy Flynn
	This is very similar to the LittleEndianInputBitStream bit buffer change from 8e834d4bb2f8a217013142658fe7203c5a5c3170. We currently buffer one byte of data for the underlying stream. And when we put bits onto that buffer, we do so 1 bit at a time. This replaces the u8 buffer with a u64. And instead of looping at all, we perform bitwise operations to write the desired number of bits. Using the "enwik8" file as a test (100MB uncompressed, commonly used in benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time decreases from: 13.62s to 10.9s on Serenity (cold) 13.62s to 9.22s on Serenity (warm) 2.93s to 2.32s on Linux One caveat is that this requires explicitly flushing any leftover bits when the caller is done with the stream. The byte buffer implementation implicitly flushed its data every time the buffer was byte-aligned, as doing so would always fill the byte. This is no longer the case. But for now, this should be fine as the one user of this class, DEFLATE, already has a "flush everything now that we're done" finalizer.
2023-04-02	LibCompress: Make CanonicalCode::from_bytes() return ErrorOr<>	Nico Weber
	No intended behavior change.
2023-04-01	LibCompress: Implement block size validation for XZ streams	Tim Schumacher

2023-04-01	LibCompress: Factor out the list of XZ check sizes	Tim Schumacher

2023-04-01	LibCompress: Reduce indentation in CompressedBlock::try_read_more()	Nico Weber
	...by removing `else` after `return`. No behavior change.
2023-04-01	LibCompress: Add a utility to GZIP compress an entire file	Timothy Flynn
	This is copy-pasted from the gzip utility, along with its existing TODO. This is currently only needed by that utility, but this gives us API symmetry with GzipDecompressor, and helps ensure we won't end up in a situation where only one utility receives optimizations that should be received by all interested parties.
2023-04-01	gunzip+LibCompress: Move utility to decompress files to GzipDecompressor	Timothy Flynn
	This is to allow re-using this method (and any optimization it receives) by other utilities, like gzip.
2023-03-31	LibCompress: Remove two needless heap allocations	Nico Weber

2023-03-31	AK+LibCompress: Remove the Deflate back-reference intermediate buffer	Timothy Flynn
	Instead of reading bytes from the output stream into a buffer, just to immediately write them back out, we can skip the middle-man and copy the bytes directly into the output buffer.
2023-03-31	gunzip+LibCompress: Increase buffer sizes used by Deflate and gunzip	Timothy Flynn
	Co-authored-by: Andreas Kling <kling@serenityos.org>
2023-03-30	LibCompress: Use LZMA context from preexisting dictionary	Tim Schumacher

2023-03-30	LibCompress: Avoid overflowing the size of uncompressed LZMA2 chunks	Tim Schumacher

2023-03-30	LibCompress: Use the correct LZMA repetition offset in all cases	Tim Schumacher

2023-03-30	LibCompress: Only require new LZMA2 properties after dictionary reset	Tim Schumacher

2023-03-30	LibCompress: Reduce repeated code in the LZMA decompressor	Tim Schumacher

2023-03-30	LibCompress: Implement support for multiple concatenated XZ streams	Tim Schumacher

2023-03-30	LibCompress: Move XZ header validation into the read function	Tim Schumacher
	The constructor is now only concerned with creating the required streams, which means that it no longer fails for XZ streams with invalid headers. Instead, everything is parsed and validated during the first read, preparing us for files with multiple streams.
2023-03-30	LibCompress: Implement proper handling of LZMA end-of-stream markers	Tim Schumacher

2023-03-30	LibCompress: Move common LZMA end-of-file checks into helper functions	Tim Schumacher

2023-03-29	LibCompress: Decode non-self-referencing back-references in one shot	Timothy Flynn
	We currently decode back-references one byte at a time, while writing that byte back out to the output buffer. This is only necessary when the back-reference refers to itself, i.e. when the back-reference distance is less than its length. In other cases, we can read the entire back- reference block in one shot. Using the "enwik8" file as a test (100MB uncompressed, commonly used in benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression time decreases from: 5.8s to 4.89s on Serenity (cold) 2.3s to 1.72s on Serenity (warm) 1.6s to 1.06s on Linux
2023-03-29	LibCompress: Use prefix tables to decode Huffman codes up to 8 bits long	Timothy Flynn
	Huffman codes have a useful property in that they are prefix codes. That is, a set of bits representing a Huffman-coded symbol is never a prefix of another symbol. This allows us to create a table, where each index in the table are integers whose prefix is the entry's corresponding Huffman code. With Deflate, we can have codes up to 16 bits in length, thus creating a prefix table with 2^16 entries. So instead of creating a table fit all possible codes, we use a cutoff of 8-bit codes. Codes larger than 8 bits fall back to the binary search method. Using the "enwik8" file as a test (100MB uncompressed, commonly used in benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression time decreases from 3.527s to 2.585s on Linux.
2023-03-29	LibCompress: Use a bit stream for the entire GZIP decompression process	Timothy Flynn
	We currently mix normal and bit streams during GZIP decompression, where the latter is a wrapper around the former. This isn't causing issues now as the underlying bit stream buffer is a byte, so the normal stream can pick up where the bit stream left off. In order to increase the size of that buffer though, the normal stream will not be able to assume it can resume reading after the bit stream. The buffer can easily contain more bits than it was meant to read, so when the normal stream resumes, there may be N bits leftover in the bit stream that the normal stream was meant to read. To avoid weird behavior when mixing streams, this changes the GZIP decompressor to always read from a bit stream.
2023-03-24	LibCompress: Speed up deflate decompression by ~11%	Andreas Kling
	...simply by using LittleEndianInputBitStream::read_bit() instead of read_bits(1). This puts us on the fast path for single-bit reads. There's still lots of money on the table for bigger optimizations to claim here, just picking an embarrassingly low-hanging fruit. :^)
2023-03-21	LibCompress: Add support for XZ	Tim Schumacher

2023-03-21	LibCompress: Add support for LZMA2	Tim Schumacher

2023-03-21	LibCompress: Allow providing an external dictionary for LZMA	Tim Schumacher
	While at it, rename the former "output buffer" to "dictionary", since that's its primary function.