Age | Commit message (Collapse) | Author |
|
Rather than the very C-like API we currently have, accepting a void* and
a length, let's take a Bytes object instead. In almost all existing
cases, the compiler figures out the length.
|
|
This helper constructor exists on the unspecialized Span<T> class also,
and is convenient for e.g. creating Bytes from:
u8 buffer[64];
Bytes bytes { buffer };
|
|
From the getentropy() man page, "The maximum permitted value for the
length argument is 256". Several of our tests were passing lengths of
several thousand bytes, causing getentropy() to fail with EIO, which we
were completely ignoring. This caused these tests to only test long
sequences of 0x00.
We now loop over the provided buffer to fill it 256 bytes at a time. If
getentropy() fails for any reason, we fall back to the default method of
filling it with one random byte at a time.
|
|
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2f8a217013142658fe7203c5a5c3170.
We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.
This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:
13.62s to 10.9s on Serenity (cold)
13.62s to 9.22s on Serenity (warm)
2.93s to 2.32s on Linux
One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
|
|
Instead of reading bytes from the output stream into a buffer, just to
immediately write them back out, we can skip the middle-man and copy the
bytes directly into the output buffer.
|
|
The issue was that the buffer would only be filled if it was empty.
|
|
We currently only fill a buffer when it is empty. So if it has 1 byte
and 16 KB was requested, only that 1 byte would be returned. Instead,
attempt to refill the buffer when it's size is less than the requested
size.
|
|
When reading, we currently only fill a BufferedStream's buffer when it
is empty, and only with 1 KB of data. This means that while the buffer
defaults to a size of 16 KB, at least 15 KB is always unused.
|
|
The current implementation of `Array<T, 0>` has a zero-length C array as
its storage type. While this is accepted as a GNU extension, when
compiling with Clang 16, an UBSan error is raised every time an object
is accessed whose only field is a zero-length array.
This is likely a bug in Clang 16's implementation of UBSan, which has
been reported here: https://github.com/llvm/llvm-project/issues/61775
|
|
Else:
AK/BitStream.h:218:24:
error: inline function '...::lsb_mask<unsigned char>' is not
defined [-Werror,-Wundefined-inline]
static constexpr T lsb_mask(T bits)
^
|
|
We current buffer one byte of data from the underlying stream. And when
we pull bits off that buffer, we do so 1 or 8 bits at a time (depending
on whether the buffer is byte aligned). The 1-bit-at-a-time loop is by
far the most common during e.g. GZIP decompression.
This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to extract the desired number of bits.
Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), decompression
time decreases from:
242s to 35s on Serenity
11.125s to 3.527s on Linux
Note that BigEndianInputBitStream can also use the same techniques,
and some of the methods here may make sense to live in an endianness-
agnostic base class. The focus is GZIP right now though, which only
uses the little endian stream.
|
|
Analogous to std::numeric_limits<T>::digits.
|
|
This mirrors String::from_utf8(StringView).
Jakt will use this to construct strings instead of just assuming the
allocation will succeed, lowering the API difference between
Jakt::String and AK::String by one API :^)
|
|
|
|
|
|
|
|
|
|
|
|
This patch parses enough of GPOS tables to be able to support the
kerning information embedded in Inter.
Since that specific font only applies positioning offsets to the first
glyph in each pair, I was able to get away with not changing our API.
Once we start adding support for more sophisticated positioning, we'll
need to be able to communicate more than a simple "kerning offset" to
the clients of this code.
|
|
With Clang, the previous/next pointers in buckets of an
`OrderedHashTable` are not cleared when a bucket is being shifted up as
a result of a removed bucket. As a result, an unfortunate pointer mixup
could lead to an infinite loop in the `HashTable` iterator, which was
exposed in `HashMap::keys()`.
Co-authored-by: Luke Wilde <lukew@serenityos.org>
|
|
Now it is called `CaseInsensitiveASCIIStringViewTraits`, so we can be
more specific about what data structure does it operate onto. ;)
|
|
We already had head(), so let's also have tail().
|
|
|
|
|
|
No functional changes.
|
|
No functional changes.
|
|
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").
Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).
No functional changes, just a lot of new FIXMEs.
|
|
We don't need to decode the entire code point to know its length. This
reduces the runtime of decoding a string containing 5 million instances
of U+10FFFF from over 4 seconds to 0.9 seconds.
|
|
Let's add FlyString::from_deprecated_fly_string() so we can use it
instead of FlyString::from_utf8(). This will make it easier to detect
potential unncessary allocations as we transfer to FlyString.
|
|
And alphabetically sort the list while I'm at it.
|
|
Let's make it clear that these functions deal with ASCII case only.
|
|
|
|
This avoids rehashing the string every time.
|
|
|
|
Also drop the try_ prefix from the fallible function, as it is no longer
needed to distinguish the two.
|
|
Name it StringBuilder::try_to_byte_buffer accordingly :^)
|
|
We currently fully casefold the left- and right-hand sides to compare
two strings with case-insensitivity. Now, we casefold one code point at
a time, storing the result in a view for comparison, until we exhaust
both strings.
|
|
This is for convenience, and matches our other UTF-N views.
|
|
We currently only accept a char, instead of a full code point.
|
|
We currently pass the code point to StringView::{starts,ends}_with,
which actually accepts a single char, thus cannot handle non-ASCII
code points.
|
|
This is similar to equals_ignoring_case() but only cares about ASCII
case insensitivity.
|
|
|
|
|
|
This adds the conversion function to_deprecated_fly_string() to enable
conversion from new FlyString to DeprecatedFlyString.
|
|
|
|
|
|
|
|
Now UFixedBigInt exposes API to do wide multiplications of this kind
efficiently.
|
|
|
|
|