diff options
author | Timothy Flynn <trflynn89@pm.me> | 2023-02-25 10:36:21 -0500 |
---|---|---|
committer | Linus Groh <mail@linusgroh.de> | 2023-02-25 22:23:39 +0100 |
commit | fa96811a220609f951a705ccc84e4458f0c7cf28 (patch) | |
tree | 85e040b03805a3f3416699b4fc1aa19587f34e0d /Tests/LibUnicode/CMakeLists.txt | |
parent | 09d40bfbb24588b5659f17e2701b5c367a447110 (diff) | |
download | serenity-fa96811a220609f951a705ccc84e4458f0c7cf28.zip |
LibUnicode: Skip over emoji sequences in grapheme boundary segmentation
Emoji sequences in the grapheme segmentation spec are a bit tricky:
\p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic}
Our current strategy of tracking a boolean to indicate if we are in an
emoji sequence was causing us to break up emoji made of multiple sub-
sequences. For example, in the "family: man, woman, girl, boy" sequence:
U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466
We would break at indices 0 (correctly) and 6 (incorrectly).
Instead of tracking a boolean, it's quite a bit simpler to reason about
emoji sequences by just skipping past them entirely. Note that in cases
like the above emoji, we skip one sub-sequence at a time.
Diffstat (limited to 'Tests/LibUnicode/CMakeLists.txt')
0 files changed, 0 insertions, 0 deletions