Age | Commit message (Collapse) | Author |
|
|
|
This changes URL parser to use the 0xFFFFFFFF constant instead of 0 to
indicate end of file. This fixes a bug where inputs containing null
bytes would terminate the parser early, because they were interpreted
as end of file.
|
|
This patch adds a state_name method to URLParser to convert a state to a
string. With this, the debugging statements now display the state names.
Furthermore, this fixes a bug where non-ASCII code points were
formatted as characters, which fails an assertion in the formatting
system.
|
|
Because non-ASCII code points have negative byte values, trimming away
control characters requires checking for negative bytes values.
This also adds a test case with a URL containing non-ASCII code points.
|
|
|
|
A Function object should not be set to a different functor while the
original functor is currently executing.
|
|
Ideally a Function should not be clear()ed while it's running.
Unfortunately a number of callers do just that and identifying all of
them would require inspecting all call sites for operator() and clear().
Instead this adds support for deferring clear() until after the
function has finished executing.
|
|
Previously this was generating a crazy number of symbols, and it was
also pretty-damn-slow as it was defined recursively, which made the
compiler incapable of inlining it (due to the many many layers of
recursion before it terminated).
This commit replaces the recursion with a pack expansion and marks it
always-inline.
|
|
StringView::lines() supports line-separators β\nβ, β\rβ, and β\r\nβ.
The method will drop an entire line if it is surrounded by β\rβ
and β\nβ separators on the left and right sides respectively.
|
|
The previous behavior was to always VERIFY that the UTF-8 bytes were
valid when iterating over the code points of an UTF8View. This change
makes it so we instead output the 0xFFFD 'REPLACEMENT CHARACTER'
code point when encountering invalid bytes, and keep iterating the
view after skipping one byte.
Leaving the decision to the consumer would break symmetry with the
UTF32View API, which would in turn require heavy refactoring and/or
code duplication in generic code such as the one found in
Gfx::Painter and the Shell.
To make it easier for the consumers to detect the original bytes, we
provide a new method on the iterator that returns a Span over the
data that has been decoded. This method is immediately used in the
TextNode::compute_text_for_rendering method, which previously did
this in a ad-hoc waay.
This also add tests for the new behavior in TestUtf8.cpp, as well
as reinforcements to the existing tests to check if the underlying
bytes match up with their expected values.
|
|
This replaces ctype.h with CharacterType.h everywhere I could find
issues with narrowing conversions. While using it will probably make
sense almost everywhere in the future, the most critical places should
have been addressed.
|
|
This patch introduces CharacterTypes.h, which aims to be a replacement
for most usages of ctype.h. In contrast to that implementation, this
header makes use of exclusively constexpr functions and support the full
Unicode code point set without any narrowing-conversion issues.
|
|
In order for IntrusiveList to be capable of replacing InlineLinkedList,
it needs to support reverse iteration. InlineLinkedList currently
supports manual reverse iteration by calling list->last() followed by
node->prev().
|
|
This removes code that isn't used anywhere.
|
|
|
|
Previously we'd incur the costs for a function call via the PLT even
for the most trivial ref-count actions like increasing/decreasing the
reference count.
By moving the code to the header file we allow the compiler to inline
this code into the caller's function.
|
|
Checking for this (and get()'ing it) is always invalid, so let's just
disallow it.
This also finds two bugs where the code is checking for types that can
never actually be in the variant (which was actually a refactor
artifact).
|
|
|
|
|
|
We have to stop scanning once we hit a non-strippable character.
Add some tests to cover this.
|
|
|
|
This replaces URL::to_string_encoded() with to_string() and removes the
former, since they are now equivalent.
|
|
This changes the URL class to use the correct constness for getters,
setters and other methods. It also changes the entire class to use east
const style.
|
|
This patch introduces a new operator== to compare an Optional to its
contained type directly. If the Optional does not contain a value, the
comparison will always return false.
This also adds a test case for the new behavior as well as comparison
between Optional objects themselves.
|
|
|
|
|
|
|
|
This adds a hostname parameter as the third parameter to
URL::create_with_file_scheme(). If the hostname is "localhost", it will
be ignored (as per the URL specification).
This can for example be used by ls(1) to create more conforming file
URLs.
|
|
This rewrites the URL validation check to be more specific, so it can
more accurately detect if a user of URL class constructs invalid URLs by
hand.
|
|
The m_path member variable has been superseded by m_paths. Thus, it has
been removed. The path() getter will continue to exist as a convenience
method for getting the path joined together as a string.
|
|
|
|
This replaces the old URL::parse() and URL::complete_url() parsing
mechanisms with the new spec-compliant URLParser::parse().
|
|
This adds URL serialization methods which are more in line with the
specification.
The serialize_for_display() method should be used e.g. in the browser
address bar, and as per the spec should not display username and
password. Furthermore, it could decode most percent-encoded code points,
although that is not implemented yet.
|
|
This adds a new URL parser, which aims to be compliant with the URL
specification (https://url.spec.whatwg.org/). It also contains a
rudimentary data URL parser.
|
|
This adds a few helper functions and a private constructor to
instantiate a data URL to the URL class. These will be needed by the
upcoming URL parser.
|
|
This adds the m_username, m_password, m_paths and m_cannot_be_a_base_url
member variables to the URL class. These are necessary for the upcoming
new URL parser.
The deprecated m_path variable shadows the m_paths variable if it is
non-null. This behavior will be removed once the old URL parser has been
removed.
|
|
This removes URLParser, because its two exposed functions, urlencode()
and urldecode(), have been superseded by URL::percent_encode() and
URL::percent_decode(). This is in preparation for the introduction of a
new URL parser.
|
|
This replaces all occurrences of those functions with the newly
implemented functions URL::percent_encode() and URL::percent_decode().
The old functions will be removed in a further commit.
|
|
This adds a few new functions to percent encode/decode strings according
to the URL specification. The functions allow specifying a
PercentEncodeSet, which is defined by the specification. It will be used
to replace the current urlencode() and urldecode() functions in a
further commit.
This commit adds a few duplicate helper functions in the URL class, such
as is_digit() and is_ascii_digit(). This will be cleaned up as soon as
the upcoming new URL parser will replace the current one.
|
|
The methods added make it possible to use the trim mechanism with
specified characters, unlike trim_whitespace(), which uses predefined
characters.
|
|
This adds a peek method for Utf8CodepointIterator, which enables it to
be used in some parsing cases where peeking is necessary.
peek(0) is equivalent to operator*, expect that peek() does not contain
any assertions and will just return an empty Optional<u32>.
This also implements a test case for iterating UTF-8.
|
|
This renames all references to protocol to scheme, which is the name
used by the URL standard (https://url.spec.whatwg.org/). Externally, all
methods referencing "protocol" were duplicated with "scheme". The old
methods still exist as compatibility.
|
|
This patch removes unnecessary function parameter names in declarations
of the URL class. It also changes parameter types from String to
StringView where applicable.
|
|
|
|
For non-x86 targets, it's not very nice to define inline functions in
AK/Memory.h with asm volatile implementations. Guard this inline
assembly with ARCH(I386) and provide portable alternatives. Since we
always compile with optimizations, the hand-vectorized memset and
memcpy seem to be of dubious value, but we'll keep them here until
proven one way or another.
This should fix the Lagom build on native M1 macOS that was reported
on Discord the other day.
|
|
This doesn't really _fix_ anything, it just gets rid of the API and
instead makes the users explicitly use `adopt_own_if_non_null()`.
|
|
Instead we can just use ByteBuffer::size() which already keeps track
of the buffer's size.
|
|
Found by OSS Fuzz:
Related commit: 3908a49661a00e15621748dcb2b0424f29433571
Co-authored-by: Ben Wiederhake <BenWiederhake.GitHub@gmx.de>
|
|
Previously the StringBuilder class would use memcpy() to write
directly into the ByteBuffer's buffer. Instead we should use the
append() method which ensures we don't overrun the buffer.
|
|
This allows us to mark the slow part (i.e. where we copy the buffer) as
NEVER_INLINE because this should almost never get called and therefore
should also not get inlined into callers.
|