summaryrefslogtreecommitdiff
path: root/Tests/LibUnicode/CMakeLists.txt
diff options
context:
space:
mode:
authorTimothy Flynn <trflynn89@pm.me>2023-02-25 10:36:21 -0500
committerLinus Groh <mail@linusgroh.de>2023-02-25 22:23:39 +0100
commitfa96811a220609f951a705ccc84e4458f0c7cf28 (patch)
tree85e040b03805a3f3416699b4fc1aa19587f34e0d /Tests/LibUnicode/CMakeLists.txt
parent09d40bfbb24588b5659f17e2701b5c367a447110 (diff)
downloadserenity-fa96811a220609f951a705ccc84e4458f0c7cf28.zip
LibUnicode: Skip over emoji sequences in grapheme boundary segmentation
Emoji sequences in the grapheme segmentation spec are a bit tricky: \p{Extended_Pictographic} Extend* ZWJ × \p{Extended_Pictographic} Our current strategy of tracking a boolean to indicate if we are in an emoji sequence was causing us to break up emoji made of multiple sub- sequences. For example, in the "family: man, woman, girl, boy" sequence: U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466 We would break at indices 0 (correctly) and 6 (incorrectly). Instead of tracking a boolean, it's quite a bit simpler to reason about emoji sequences by just skipping past them entirely. Note that in cases like the above emoji, we skip one sub-sequence at a time.
Diffstat (limited to 'Tests/LibUnicode/CMakeLists.txt')
0 files changed, 0 insertions, 0 deletions