serenity - The Serenity Operating System 🐞

Age	Commit message (Collapse)	Author
2023-03-05	LibUnicode: Detect ZWJ sequences when filtering by emoji presentation	Timothy Flynn
	This was preventing some unqualified emoji sequences from rendering properly, such as the custom SerenityOS flag. We rendered the flag correctly when given the fully qualified sequence: U+1F3F3 U+FEOF U+200D U+1F41E But were not detecting the unqualified sequence as an emoji when also filtering for emoji-presentation sequences: U+1F3F3 U+200D U+1F41E
2023-02-25	LibUnicode: Add a unit test for Unicode grapheme and word segmentation	Timothy Flynn
	These include tests for previously broken boundary conditions.
2023-02-24	LibUnicode: Add a method to check if a code point could start an emoji	Timothy Flynn

2023-02-15	LibUnicode: Fix typos causing text segmentation on mid-word punctuation	Timothy Flynn
	For example the words "can't" and "32.3" should not have boundaries detected on the "'" and "." code points, respectively. The String test cases fixed here are because "b'ar" is now considered one word.
2023-01-18	LibUnicode: Parse and generate case folding code point data	Timothy Flynn
	Case folding rules have a similar mapping style as special casing rules, where one code point may map to zero or more case folding rules. These will be used for case-insensitive string comparisons. To see how case folding can differ from other casing rules, consider "ß" (U+00DF): >>> "ß".lower() 'ß' >>> "ß".upper() 'SS' >>> "ß".title() 'Ss' >>> "ß".casefold() 'ss'
2023-01-16	LibUnicode: Support full case folding for titlecasing a string	Timothy Flynn
	Unicode declares that to titlecase a string, the first cased code point after each word boundary should be transformed to its titlecase mapping. All other codepoints are transformed to their lowercase mapping.
2023-01-16	LibUnicode: Generate simple case folding mappings for titlecase	Timothy Flynn
	Note we already generate the special case foldings for titlecase.
2023-01-09	LibUnicode+LibJS: Propagate OOM from Unicode normalization	Timothy Flynn

2023-01-09	LibUnicode+LibJS+LibWeb: Propagate OOM from Unicode case transformations	Timothy Flynn

2022-10-07	LibUnicode: Update code point ideographic replacements for Unicode 15	Timothy Flynn

2022-10-07	LibUnicode: Fix Hangul syllable composition for specific cases	matcool
	This fixes `combine_hangul_code_points` which would try to combine a LVT syllable with a trailing consonant, resulting in a wrong character. Also added a test for this specific case.
2022-10-06	Tests: Add tests for LibUnicode's normalize	matcool

2022-09-05	LibLocale: Move locale source files to the LibLocale library	Timothy Flynn
	Everything is now setup to create the LibLocale library and link it where needed.
2022-09-05	LibLocale: Move locale test files to the LibLocale folder	Timothy Flynn

2022-09-05	LibLocale: Move locale source files to the LibLocale folder	Timothy Flynn
	These are still included in LibUnicode, but this updates their location and the include paths of other files which include them.
2022-09-05	Userland: Move files destined for LibLocale to the Locale namespace	Timothy Flynn

2022-09-05	LibUnicode+Userland: Migrate generated CLDR data to LibLocaleData	Timothy Flynn
	Currently, LibUnicodeData contains the generated UCD and CLDR data. Move the UCD data to the main LibUnicode library, and rename LibUnicodeData to LibLocaleData. This is another prepatory change to migrate to LibLocale.
2022-07-12	Everywhere: Add sv suffix to strings relying on StringView(char const*)	sin-ack
	Each of these strings would previously rely on StringView's char const* constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.
2022-02-15	Tests: Add Unicode tests for CharacterType block properties	thankyouverycool

2022-01-31	Everywhere: Update copyrights with my new serenityos.org e-mail :^)	Timothy Flynn

2022-01-19	LibJS+LibUnicode: Return the appropriate time zone name depending on DST	Timothy Flynn

2022-01-12	LibUnicode: Swap the preferred order of standard time zone display names	Timothy Flynn
	Our generator is currently preferring the DST variant of the time zone display names over the non-DST variant. LibTimeZone currently does not have DST support, and operates in a mode that basically assumes DST does not exist. Swap the display names for now just to be consistent until we have DST support. Note we will need to generate both of these variants and select the appropriate one at runtime once we have DST support.
2022-01-11	LibUnicode: Parse and generate long and short generic time zone names	Timothy Flynn
	This implements the CalendarPatternStyle::{Long,Short}Generic styles of time zone name formatting.
2022-01-11	LibUnicode: Fall back to GMT offset when a time zone name is unavailable	Timothy Flynn
	The following table in TR-35 includes a web of fall back rules when the requested time zone style is unavailable: https://unicode.org/reports/tr35/tr35-dates.html#dfst-zone Conveniently, the subset of styles supported by ECMA-402 (and therefore LibUnicode) all either fall back to GMT offset or to a style that is unsupported but itself falls back to GMT offset.
2022-01-11	LibUnicode: Implement TR-35's localized GMT offset formatting	Timothy Flynn
	This adds an API to use LibTimeZone to convert a time zone such as "America/New_York" to a GMT offset string like "GMT-5" (short form) or "GMT-05:00" (long form).
2022-01-06	LibUnicode: Do not assume time zones & meta zones have a 1-to-1 mapping	Timothy Flynn
	The generator parses metaZones.json to form a mapping of meta zones to time zones (AKA "golden zone" in TR-35). This parser errantly assumed this was a 1-to-1 mapping.
2022-01-04	Tests: Link some tests directly against LibUnicodeData	Timothy Flynn
	These were missed in 565a880ce5a14bac817c73916e91ebfa04c8b99b. This wasn't an issue because these tests don't pledge/unveil anything, so they could happily dlopen() the library at runtime. But this is now needed in order to migrate LibUnicode towards weak symbols instead.
2021-11-30	LibUnicode: Support code point names that apply to ranges of code points	Timothy Flynn
	For example, consider the following adjacent entries in UnicodeData.txt: 3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;; 4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;; Our current implementation would assign the display name "CJK Ideograph Extension A" to code points U+3400 & U+4DBF, but not to the code points in between. Not only should those code points be assigned a name, but the Unicode spec also has formatting rules on what the names should be (the names for these ranged code points are not as they appear in UnicodeData.txt). The spec also defines names for code point ranges that actually are listed individually in UnicodeData.txt. For example: 2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;; 2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;; 2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;; Code points are only coalesced into a range if all fields after the name are equivalent. Our parser will insert the range and its name formatting pattern when it comes across the first code point in that range, then ignore other code points in that range. This reduces the number of names we generated by nearly 2,000.
2021-11-19	LibUnicode: Support locales-without-script aliases for ECMA-402	Timothy Flynn
	As noted by ECMA-402, if a supported locale contains all of a language, script, and region subtag, then the implementation must also support the locale without the script subtag. The most complicated example of this is the zh-TW locale. The list of locales in the CLDR database does not include zh-TW or its maximized zh-Hant-TW variant. Instead, it inlcudes the zh-Hant locale. However, zh-Hant-TW is listed in the default-content locale list in the cldr-core package. This defines an alias from zh-Hant-TW to zh-Hant. We must then also support the zh-Hant-TW alias without the script subtag: zh-TW. This transitively maps zh-TW to zh-Hant, which is a case quite heavily tested by test262.
2021-11-09	LibUnicode: Parse the CLDR's defaultContent.json locale list	Timothy Flynn
	This file contains the list of locales which default to their parent locale's values. In the core CLDR dataset, these locales have their own files, but they are empty (except for identity data). For example: https://github.com/unicode-org/cldr/blob/main/common/main/en_US.xml In the JSON export, these files are excluded, so we currently are not recognizing these locales just by iterating the locale files. This is a prerequisite for upgrading to CLDR version 40. One of these default-content locales is the popular "en-US" locale, which defaults to "en" values. We were previously inferring the existence of this locale from the "en-US-POSIX" locale (many implementations, including ours, strip variants such as POSIX). However, v40 removes the "en-US-POSIX" locale entirely, meaning that without this change, we wouldn't know that "en-US" exists (we would default to "en"). For more detail on this and other v40 changes, see: https://cldr.unicode.org/index/downloads/cldr-40#h.nssoo2lq3cba
2021-09-08	LibUnicode+LibJS: Store locale keyword values as a single string	Timothy Flynn
	Previously, LibUnicode would store the values of a keyword as a Vector. For example, the locale "en-u-ca-abc-def" would have its keyword "ca" stored as {"abc, "def"}. Then, canonicalization would occur on each of the elements in that Vector. This is incorrect because, for example, the keyword value "true" should only be dropped if that is the entire value. That is, the canonical form of "en-u-kb-true" is "en-u-kb", but "en-u-kb-abc-true" does not change for canonicalization. However, we would canonicalize that locale as "en-u-kb-abc".
2021-09-06	LibUnicode: Implement locale-aware BEFORE_DOT special casing	Timothy Flynn
	Note that the algorithm in the Unicode spec is for checking that a code point precedes U+0307, but the special casing condition NotBeforeDot is interested in the inverse of this rule.
2021-09-06	LibUnicode: Implement locale-aware MORE_ABOVE special casing	Timothy Flynn

2021-09-06	LibUnicode: Implement locale-aware AFTER_SOFT_DOTTED special casing	Timothy Flynn

2021-09-06	LibUnicode: Implement locale-aware AFTER_I special casing	Timothy Flynn

2021-09-02	LibUnicode: Add lexer to test if a string matches the "type" production	Timothy Flynn

2021-09-02	Tests: Remove all file(GLOB) from CMakeLists in Tests	Andrew Kaster
	Using a file(GLOB) to find all the test files in a directory is an easy hack to get things started, but has some drawbacks. Namely, if you add a test, it won't be found again without re-running CMake. `ninja` seems to do this automatically, but it would be nice to one day stop seeing it rechecking our globbed directories.
2021-09-01	LibUnicode: Resolve the most likely territory alias when there are many	Timothy Flynn

2021-09-01	LibUnicode: Perform complex Unicode locale alias substitution	Timothy Flynn

2021-09-01	LibUnicode: Canonicalize calendar subtags	Timothy Flynn
	Calendar subtags are a bit of an odd-man-out in that we must match the variants "ethiopic-amete-alem" in that order, without any other variant in the locale. So a separate method is needed for this, and we now defer sorting the variant list until after other canonicalization is done.
2021-09-01	LibUnicode: Canonicalize timezone subtags	Timothy Flynn

2021-09-01	LibUnicode: Canonicalize the subtag "imperial" to "uksystem"	Timothy Flynn

2021-09-01	LibUnicode: Canonicalize the subtag "primary" and "tertiary" to "levelN"	Timothy Flynn

2021-09-01	LibUnicode: Canonicalize the subtag "names" to "prprname"	Timothy Flynn

2021-09-01	LibUnicode: Canonicalize the subtag "yes" to "true"	Timothy Flynn

2021-09-01	LibUnicode: Substitute Unicode locale aliases during canonicalization	Timothy Flynn
	Unicode TR35 defines how locale subtag aliases should be emplaced when converting a locale to canonical form. For most subtags, it is a simple substitution. Language subtags depend on context; for example, the language "sh" should become "sr-Latn", but if the original locale has a script subtag already ("sh-Cyrl"), then only the language subtag of the alias should be taken ("sr-Latn"). To facilitate this, we now make two passes when canonicalizing a locale. In the first pass, we convert the LocaleID structure to canonical syntax (where the conversions all happen in-place). In the second pass, we form the canonical string based on the canonical syntax.
2021-09-01	LibJS+LibUnicode: Store parsed Unicode locale data as full strings	Timothy Flynn
	Originally, it was convenient to store the parsed Unicode locale data as views into the original string being parsed. But to implement locale aliases will require mutating the data that was parsed. To prepare for that, store the parsed data as proper strings.
2021-08-30	LibUnicode: Canonicalize locale private use extensions	Timothy Flynn

2021-08-30	LibUnicode: Canonicalize locale extensions	Timothy Flynn

2021-08-30	LibUnicode: Parse locale private use extensions	Timothy Flynn