Age | Commit message (Collapse) | Author |
|
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
|
|
|
|
Previously we were passing raw UTF-8 bytes as code points, which caused
CSS content properties to display incorrect characters.
This makes bullet separators in Wikipedia templates display correctly.
|
|
This ripples down to LibWeb's HTML and XHR decoders, which therefore
become less allocation heavy.
|
|
|
|
It's a pretty simple charset: the bottom 128 bytes (0x00-0x7F) are
standard ASCII, while the top 128 bytes (0x80-0xFF) are mapped to a
portion of the Unicode Private Use Area, specifically 0xF780-0xF7FF.
This is used by Google Maps for certain blobs.
|
|
This functions takes a user-provided decoder and will only use it if no
BOM is in the input.
If there is a BOM, it will ignore the given decoder and instead decode
the input with the appropriate Unicode decoder for the detected BOM.
This is only to be used where it's specifically needed, for example XHR
uses this for compatibility with deployed content. As such, it has an
obnoxious name to discourage usage.
|
|
This takes the input and sniffs it for a BOM. If it has the UTF-8 or
UTF-16BE BOM, it will return their respective decoder. Currently we
don't have a UTF-16LE decoder, so it will assert TODO if it detects
a UTF-16LE BOM. If there is no recognisable BOM, it will return no
decoder.
|
|
These objects contain no data members, so there is no point in creating
1-byte heap allocations for them. We don't need to have them as static
local variables, as they are trivially constructible, so they can simply
be global variables.
|
|
Fixes #6840.
|
|
|
|
Before, this was getting included as part of the output text, which was
confusing the HTML parser. Nobody needs the BOM after we have identified
the codec, so now we remove it when converting to UTF-8.
|
|
This commit adds a new process method to all Decoder subclasses which
do what to_utf8 used to do, and allows callers to customize the handling
of individiual UTF-8 code points through a callback. Decoder::to_utf8
now uses this API to generate a string via StringBuilder, preserving the
original behavior.
|
|
|
|
|
|
|
|
This patch changes get_standardized_encoding to use an Optional<String>
return type instead of just returning the null string when unable to
match the provided encoding to one of the canonical encoding names.
This is part of an effort to move away from using null strings towards
explicitly using Optional<String> to indicate that the String may not
have a value.
|
|
This encoding (a superset of ascii that adds in the cyrillic alphabet)
is currently the third most used encoding on the web, and because
cyrillic glyphs were added by Dmitrii Trifonov recently, we can now
support it as well :^)
|
|
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
|
|
This is a superset of ascii that adds in the hebrew alphabet.
(Google currently assumes we are running windows due to not
recognizing Serenity as the OS in the user agent, resulting
in this encoding instead of UTF8 in google search results)
|
|
This flag warns on classes which have `virtual` functions but do not
have a `virtual` destructor.
This patch adds both the flag and missing destructors. The access level
of the destructors was determined by a two rules of thumb:
1. A destructor should have a similar or lower access level to that of a
constructor.
2. Having a `private` destructor implicitly deletes the default
constructor, which is probably undesirable for "interface" types
(classes with only virtual functions and no data).
In short, most of the added destructors are `protected`, unless the
compiler complained about access.
|
|
Reading up to the end of the input string of odd length results in
an out-of-bounds read
|
|
|
|
|
|
|
|
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.
|
|
|