Age | Commit message (Collapse) | Author |
|
Also drop the try_ prefix from the fallible function, as it is no longer
needed to distinguish the two.
|
|
Name it StringBuilder::try_to_byte_buffer accordingly :^)
|
|
|
|
Having an alias function that only wraps another one is silly, and
keeping the more obvious name should flush out more uses of deprecated
strings.
No behavior change.
|
|
It currently uses the non-fallible `append` method to append each UTF-8
encoded byte of the code point.
|
|
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:
\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b
(Without the linebreaks.)
This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.
In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
|
|
This is nice, and is also used by the Jakt runtime.
|
|
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.
Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.
Problems in DeprecatedString:
- It assumes string allocation never fails. This makes it impossible
to use in allocation-sensitive contexts, and is the reason we had to
ban DeprecatedString from the kernel entirely.
- The awkward null state. DeprecatedString can be null. It's different
from the empty state, although null strings are considered empty.
All code is immediately nicer when using Optional<DeprecatedString>
but DeprecatedString came before Optional, which is how we ended up
like this.
- The encoding of the underlying data is ambiguous. For the most part,
we use it as if it's always UTF-8, but there have been cases where
we pass around strings in other encodings (e.g ISO8859-1)
- operator[] and length() are used to iterate over DeprecatedString one
byte at a time. This is done all over the codebase, and will *not*
give the right results unless the string is all ASCII.
How we solve these issues in the new String:
- Functions that may allocate now return ErrorOr<String> so that ENOMEM
errors can be passed to the caller.
- String has no null state. Use Optional<String> when needed.
- String is always UTF-8. This is validated when constructing a String.
We may need to add a bypass for this in the future, for cases where
you have a known-good string, but for now: validate all the things!
- There is no operator[] or length(). You can get the underlying data
with bytes(), but for iterating over code points, you should be using
an UTF-8 iterator.
Furthermore, it has two nifty new features:
- String implements a small string optimization (SSO) for strings that
can fit entirely within a pointer. This means up to 3 bytes on 32-bit
platforms, and 7 bytes on 64-bit platforms. Such small strings will
not be heap-allocated.
- String can create substrings without making a deep copy of the
substring. Instead, the superstring gets +1 refcount from the
substring, and it acts like a view into the superstring. To make
substrings like this, use the substring_with_shared_superstring() API.
One caveat:
- String does not guarantee that the underlying data is null-terminated
like DeprecatedString does today. While this was nifty in a handful of
places where we were calling C functions, it did stand in the way of
shared-superstring substrings.
|
|
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
|
|
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
|
|
This two methods add the character as many times as specified by the
second parameter.
|
|
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
|
|
This will allow us to make a fallible version of the JSON serializers.
|
|
These APIs are only used by userland, and String is OOM-infallible,
so let's just ifdef it out of the Kernel.
|
|
|
|
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
|
|
|
|
These will allow us to start using TRY() with StringBuilder operations.
|
|
|
|
Same as Vector, ByteBuffer now also signals allocation failure by
returning an ENOMEM Error instead of a bool, allowing us to use the
TRY() and MUST() patterns.
|
|
|
|
Otherwise it could produce invalid JSON.
|
|
|
|
If we can easily communicate failure, let's avoid asserting and report
failure instead.
|
|
Currently, to append a UTF-16 view to a StringBuilder, callers must
first convert the view to UTF-8 and then append the copy. Add a UTF-16
overload so callers do not need to hold an entire copy in memory.
|
|
|
|
Instead we can just use ByteBuffer::size() which already keeps track
of the buffer's size.
|
|
Found by OSS Fuzz:
Related commit: 3908a49661a00e15621748dcb2b0424f29433571
Co-authored-by: Ben Wiederhake <BenWiederhake.GitHub@gmx.de>
|
|
Previously the StringBuilder class would use memcpy() to write
directly into the ByteBuffer's buffer. Instead we should use the
append() method which ensures we don't overrun the buffer.
|
|
Previously ByteBuffer::grow() behaved like Vector<T>::resize().
However the function name was somewhat ambiguous - and so this patch
updates ByteBuffer to behave more like Vector<T> by replacing grow()
with resize() and adding an ensure_capacity() method.
This also lets the user change the buffer's capacity without affecting
the size which was not previously possible.
Additionally this patch makes the capacity() method public (again).
|
|
This reverts commit 2d011961c94ac81700c366537f52208a4c55db92.
|
|
Found by OSS Fuzz:
#34451 (old bug)
Related commit: 3908a49661a00e15621748dcb2b0424f29433571
|
|
Previously StringBuilder would start allocating an external buffer
once the caller has used up more than half of the inline buffer's
capacity. Instead we should prefer to use the inline buffer until
it is full and only then start to allocate an external buffer.
|
|
This patch adds a convenience method to AK::StringBuilder which converts
ASCII uppercase characters to lowercase before appending them.
|
|
This was removed as part of the ByteBuffer changes but the allocation
optimization is still necessary at least for non-SerenityOS targets
where malloc_good_size() isn't supported or returns a small value and
causes a whole bunch of unnecessary reallocations.
|
|
Previously ByteBuffer would internally hold a RefPtr to the byte
buffer and would behave like a reference type, i.e. copying a
ByteBuffer would not create a duplicate byte buffer, but rather
two objects which refer to the same internal buffer.
This also changes ByteBuffer so that it has some internal capacity
much like the Vector<T> type. Unlike Vector<T> however a byte
buffer's data may be uninitialized.
With this commit ByteBuffer makes use of the kmalloc_good_size()
API to pick an optimal allocation size for its internal buffer.
|
|
All users have been converted to using AK::Format via appendff().
|
|
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
|
|
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
|
|
This patch adds a 128-byte inline buffer that we use before switching
to using a dynamically growing ByteBuffer.
This allows us to avoid heap allocations in many cases, and totally
incidentally also speeds up @nico's favorite test, "disasm /bin/id"
more than 2x. :^)
|
|
All of these files were getting ByteBuffer.h from someone else and then
using it. Let's include it explicitly.
|
|
Grab the escaping logic from JSON string value serialization and use
it for serializing all keys and values.
Fixes #3917.
|
|
This time, without trailing 's'. Ran:
git grep -l 'codepoint' | xargs sed -ie 's/codepoint/code_point/g
|
|
This reverts commit ea9ac3155d1774f13ac4e9a96605c0e85a8f299e.
It replaced "codepoint" with "code_points", not "code_point".
|
|
Unicode calls them "code points" so let's follow their style.
|
|
Also, implement append(Utf32View) using it.
|
|
This encodes the incoming UTF-32 sequence as UTF-8.
|
|
With 0 initial capacity, we don't allocate an underlying ByteBuffer
for the StringBuilder, which would then lead to a null String() being
returned from to_string().
This patch makes sure we always build a valid String.
|
|
String.h no longer pulls in StringView.h. We do this by moving a bunch
of String functions out-of-line.
|
|
|