serenity - The Serenity Operating System 🐞

Age	Commit message (Collapse)	Author
2023-05-12	LibTextCodec: Change UTF-8's decoder to replace invalid code points	Timothy Flynn
	The UTF-8 decoder will currently crash if it is provided invalid UTF-8 input. Instead, change its behavior to match that of all other decoders to replace invalid code points with U+FFFD. This is required by the web.
2023-03-10	Everywhere: Rename equals_ignoring_case => equals_ignoring_ascii_case	Andreas Kling
	Let's make it clear that these functions deal with ASCII case only.
2023-02-28	LibTextCodec/Latin1: Iterate over input string with u8 instead of char	Luke Wilde
	Using char causes bytes equal to or over 0x80 to be treated as a negative value and produce incorrect results when implicitly casting to u32. For example, `atob` in LibWeb uses this decoder to convert non-ASCII values to UTF-8, but non-ASCII values are >= 0x80 and thus produces incorrect results in such cases: ```js Uint8Array.from(atob("u660"), c => c.charCodeAt(0)); ``` This used to produce [253, 253, 253] instead of [187, 174, 180]. Required by Cloudflare's IUAM challenges.
2023-02-19	LibTextCodec+Everywhere: Port Decoders to new Strings	Sam Atkins

2023-02-19	LibTextCodec: Return Optional<Decoder&> from `bom_sniff_to_decoder()`	Sam Atkins

2023-02-19	LibTextCodec+Everywhere: Return Optional<Decoder&> from `decoder_for()`	Sam Atkins

2023-02-15	LibTextCodec+Everywhere: Make TextCodec::decoder_for() take a StringView	Sam Atkins
	We don't need a full String/DeprecatedString inside this function, so we might as well not force users to create one.
2023-01-24	LibTextCodec: Add a MacRoman decoder	Nico Weber
	Allows displaying `<meta charset="x-mac-roman">` html files. (`:set fenc=macroman`, `:w` in vim to save in that encoding.)
2023-01-24	LibTextCodec: Simplify Latin1Decoder::process() a tiny bit	Nico Weber

2023-01-22	LibTextCodec: Make utf-16be and utf-16le codecs actually work	Nico Weber
	There were two problems: 1. They didn't handle surrogates 2. They used signed chars, leading to eg 0x00e4 being treated as 0xffe4 Also add a basic test that catches both issues. There's some code duplication with Utf16CodePointIterator::operator*(), but let's get things working first.
2022-12-06	Everywhere: Rename to_{string => deprecated_string}() where applicable	Linus Groh
	This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.
2022-12-06	AK+Everywhere: Rename String to DeprecatedString	Linus Groh
	We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)
2022-11-01	Everywhere: Explicitly link all binaries against the LibC target	Tim Schumacher
	Even though the toolchain implicitly links against -lc, it does not know where it should get LibC from except for the sysroot. In the case of Clang this causes it to pick up the LibC stub instead, which might be slightly outdated and feature missing symbols. This is currently not an issue that manifests because we pass through the dependency on LibC and other libraries by accident, which causes CMake to link against the LibC target (instead of just the library), and thus points the linker at the build output directory. Since we are looking to fix that in the upcoming commits, let's make sure that everything will still be able to find the proper LibC first.
2022-07-12	Everywhere: Add sv suffix to strings relying on StringView(char const*)	sin-ack
	Each of these strings would previously rely on StringView's char const* constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.
2022-04-01	Everywhere: Run clang-format	Idan Horowitz

2022-03-29	LibTextCodec: Pass code points instead of bytes on UTF-8 string process	Karol Kosek
	Previously we were passing raw UTF-8 bytes as code points, which caused CSS content properties to display incorrect characters. This makes bullet separators in Wikipedia templates display correctly.
2022-03-21	LibTextCodec: Don't allocate Strings on encoding normalisation	Hendiadyoin1
	This ripples down to LibWeb's HTML and XHR decoders, which therefore become less allocation heavy.
2022-03-08	LibTextCodec: Add support for the UTF16-LE encoding	Jelle Raaijmakers

2022-02-12	LibTextCodec: Add x-user-defined decoder	Luke Wilde
	It's a pretty simple charset: the bottom 128 bytes (0x00-0x7F) are standard ASCII, while the top 128 bytes (0x80-0xFF) are mapped to a portion of the Unicode Private Use Area, specifically 0xF780-0xF7FF. This is used by Google Maps for certain blobs.
2022-02-12	LibTextCodec: Add decoder function that overrides given decoder on BOM	Luke Wilde
	This functions takes a user-provided decoder and will only use it if no BOM is in the input. If there is a BOM, it will ignore the given decoder and instead decode the input with the appropriate Unicode decoder for the detected BOM. This is only to be used where it's specifically needed, for example XHR uses this for compatibility with deployed content. As such, it has an obnoxious name to discourage usage.
2022-02-12	LibTextCodec: Add BOM sniffer	Luke Wilde
	This takes the input and sniffs it for a BOM. If it has the UTF-8 or UTF-16BE BOM, it will return their respective decoder. Currently we don't have a UTF-16LE decoder, so it will assert TODO if it detects a UTF-16LE BOM. If there is no recognisable BOM, it will return no decoder.
2022-01-28	LibTextCodec: Do not allocate the various decoders	Daniel Bertalan
	These objects contain no data members, so there is no point in creating 1-byte heap allocations for them. We don't need to have them as static local variables, as they are trivially constructible, so they can simply be global variables.
2021-12-16	LibTextCodec: Add alternate Cyrillic (aka Koi8-r) encoding	Dmitry Petrov
	Fixes #6840.
2021-11-11	Everywhere: Pass AK::StringView by value	Andreas Kling

2021-09-15	LibTextCodec: Ignore BYTE ORDER MARK at the start of utf8/16 strings	Sam Atkins
	Before, this was getting included as part of the output text, which was confusing the HTML parser. Nobody needs the BOM after we have identified the codec, so now we remove it when converting to UTF-8.
2021-08-30	LibTextCodec: Add "process" API for allocation-free code point iteration	sin-ack
	This commit adds a new process method to all Decoder subclasses which do what to_utf8 used to do, and allows callers to customize the handling of individiual UTF-8 code points through a callback. Decoder::to_utf8 now uses this API to generate a string via StringBuilder, preserving the original behavior.
2021-08-20	LibTextCodec: Remove unused is_standardized_encoding()	Andreas Kling

2021-06-23	LibTextCodec: Add Turkish (aka ISO-8859-9, Windows-1254) encoding	Aatos Majava

2021-06-15	LibTextCodec: Add ISO-8859-15 (aka Latin-9) encoding	Aatos Majava

2021-05-18	LibTextCodec: Use Optional<String> for get_standardized_encoding	Max Wipfli
	This patch changes get_standardized_encoding to use an Optional<String> return type instead of just returning the null string when unable to match the provided encoding to one of the canonical encoding names. This is part of an effort to move away from using null strings towards explicitly using Optional<String> to indicate that the String may not have a value.
2021-05-01	LibTextCodec: Implement a Windows-1251 decoder	Idan Horowitz
	This encoding (a superset of ascii that adds in the cyrillic alphabet) is currently the third most used encoding on the web, and because cyrillic glyphs were added by Dmitrii Trifonov recently, we can now support it as well :^)
2021-04-22	Everything: Move to SPDX license identifiers in all files.	Brian Gianforcaro
	SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-17	LibTextCodec: Implement a Windows-1255 decoder.	Idan Horowitz
	This is a superset of ascii that adds in the hebrew alphabet. (Google currently assumes we are running windows due to not recognizing Serenity as the OS in the user agent, resulting in this encoding instead of UTF8 in google search results)
2021-04-15	Everything: Add `-Wnon-virtual-dtor` flag	Nicholas-Baron
	This flag warns on classes which have `virtual` functions but do not have a `virtual` destructor. This patch adds both the flag and missing destructors. The access level of the destructors was determined by a two rules of thumb: 1. A destructor should have a similar or lower access level to that of a constructor. 2. Having a `private` destructor implicitly deletes the default constructor, which is probably undesirable for "interface" types (classes with only virtual functions and no data). In short, most of the added destructors are `protected`, unless the compiler complained about access.
2021-03-15	LibTextCodec: Make UTF16BEDecoder read only up to an even offset	Idan Horowitz
	Reading up to the end of the input string of odd length results in an out-of-bounds read
2021-03-14	LibTextCodec: Fix IBM666 => IBM866 typo	Luke

2021-02-16	LibTextCodec: Add a simple UTF-16BE decoder	Andreas Kling

2021-02-01	LibTextCodec: Avoid duplicate definition of standard encodings	Ben Wiederhake

2021-01-16	Everywhere: Replace a bundle of dbg with dbgln.	asynts
	These changes are arbitrarily divided into multiple commits to make it easier to find potentially introduced bugs with git bisect.
2021-01-12	Libraries: Move to Userland/Libraries/	Andreas Kling