serenity - The Serenity Operating System 🐞

diff options

author	Timothy Flynn <trflynn89@pm.me>	2023-02-25 10:36:21 -0500
committer	Linus Groh <mail@linusgroh.de>	2023-02-25 22:23:39 +0100
commit	fa96811a220609f951a705ccc84e4458f0c7cf28 (patch)
tree	85e040b03805a3f3416699b4fc1aa19587f34e0d /Tests/LibUnicode/CMakeLists.txt
parent	09d40bfbb24588b5659f17e2701b5c367a447110 (diff)
download	serenity-fa96811a220609f951a705ccc84e4458f0c7cf28.zip

LibUnicode: Skip over emoji sequences in grapheme boundary segmentation

Emoji sequences in the grapheme segmentation spec are a bit tricky: \p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic} Our current strategy of tracking a boolean to indicate if we are in an emoji sequence was causing us to break up emoji made of multiple sub- sequences. For example, in the "family: man, woman, girl, boy" sequence: U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466 We would break at indices 0 (correctly) and 6 (incorrectly). Instead of tracking a boolean, it's quite a bit simpler to reason about emoji sequences by just skipping past them entirely. Note that in cases like the above emoji, we skip one sub-sequence at a time.

Diffstat (limited to 'Tests/LibUnicode/CMakeLists.txt')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: