summaryrefslogtreecommitdiff
path: root/Userland/Libraries/LibTextCodec
AgeCommit message (Collapse)Author
2021-09-15LibTextCodec: Ignore BYTE ORDER MARK at the start of utf8/16 stringsSam Atkins
Before, this was getting included as part of the output text, which was confusing the HTML parser. Nobody needs the BOM after we have identified the codec, so now we remove it when converting to UTF-8.
2021-08-30LibTextCodec: Add "process" API for allocation-free code point iterationsin-ack
This commit adds a new process method to all Decoder subclasses which do what to_utf8 used to do, and allows callers to customize the handling of individiual UTF-8 code points through a callback. Decoder::to_utf8 now uses this API to generate a string via StringBuilder, preserving the original behavior.
2021-08-20LibTextCodec: Remove unused is_standardized_encoding()Andreas Kling
2021-06-23LibTextCodec: Add Turkish (aka ISO-8859-9, Windows-1254) encodingAatos Majava
2021-06-15LibTextCodec: Add ISO-8859-15 (aka Latin-9) encodingAatos Majava
2021-05-18LibTextCodec: Use Optional<String> for get_standardized_encodingMax Wipfli
This patch changes get_standardized_encoding to use an Optional<String> return type instead of just returning the null string when unable to match the provided encoding to one of the canonical encoding names. This is part of an effort to move away from using null strings towards explicitly using Optional<String> to indicate that the String may not have a value.
2021-05-01LibTextCodec: Implement a Windows-1251 decoderIdan Horowitz
This encoding (a superset of ascii that adds in the cyrillic alphabet) is currently the third most used encoding on the web, and because cyrillic glyphs were added by Dmitrii Trifonov recently, we can now support it as well :^)
2021-04-22Everything: Move to SPDX license identifiers in all files.Brian Gianforcaro
SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-17LibTextCodec: Implement a Windows-1255 decoder.Idan Horowitz
This is a superset of ascii that adds in the hebrew alphabet. (Google currently assumes we are running windows due to not recognizing Serenity as the OS in the user agent, resulting in this encoding instead of UTF8 in google search results)
2021-04-15Everything: Add `-Wnon-virtual-dtor` flagNicholas-Baron
This flag warns on classes which have `virtual` functions but do not have a `virtual` destructor. This patch adds both the flag and missing destructors. The access level of the destructors was determined by a two rules of thumb: 1. A destructor should have a similar or lower access level to that of a constructor. 2. Having a `private` destructor implicitly deletes the default constructor, which is probably undesirable for "interface" types (classes with only virtual functions and no data). In short, most of the added destructors are `protected`, unless the compiler complained about access.
2021-03-15LibTextCodec: Make UTF16BEDecoder read only up to an even offsetIdan Horowitz
Reading up to the end of the input string of odd length results in an out-of-bounds read
2021-03-14LibTextCodec: Fix IBM666 => IBM866 typoLuke
2021-02-16LibTextCodec: Add a simple UTF-16BE decoderAndreas Kling
2021-02-01LibTextCodec: Avoid duplicate definition of standard encodingsBen Wiederhake
2021-01-16Everywhere: Replace a bundle of dbg with dbgln.asynts
These changes are arbitrarily divided into multiple commits to make it easier to find potentially introduced bugs with git bisect.
2021-01-12Libraries: Move to Userland/Libraries/Andreas Kling