diff options
author | Timothy Flynn <trflynn89@pm.me> | 2021-07-20 10:46:53 -0400 |
---|---|---|
committer | Andreas Kling <kling@serenityos.org> | 2021-07-22 09:10:44 +0200 |
commit | 0c42aece362edfbd71f3b149601c065b5c675e80 (patch) | |
tree | 7813607db2d9eed60c3f0dc0d4fa277221a5d87d /Userland/Libraries/LibJS/Runtime/Value.h | |
parent | 0e25d2393f2a7f49ded730d4a11643005ae9b468 (diff) | |
download | serenity-0c42aece362edfbd71f3b149601c065b5c675e80.zip |
LibJS: Transcode UTF-8 strings to UTF-16 and add UTF-16 accessors
LibJS parses JavaScript as UTF-8, so when creating a string, we must
transcode it to UTF-16 to handle encoded surrogate pairs.
For example, consider the following string:
"\ud83d\ude00"
The UTF-8 encoding of this surrogate pair is:
0xf0 0x9f 0x98 0x80
However, LibJS will currently store the two surrogates individually as
UTF-8 encoded bytes, rather than combining the pair:
0xed 0xa0 0xb8, 0xed 0xb8 0x80
These are not equivalent. So, as String.prototype becomes UTF-16 aware,
this encoding will no longer work for abstractions like strict equality.
Diffstat (limited to 'Userland/Libraries/LibJS/Runtime/Value.h')
-rw-r--r-- | Userland/Libraries/LibJS/Runtime/Value.h | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/Userland/Libraries/LibJS/Runtime/Value.h b/Userland/Libraries/LibJS/Runtime/Value.h index 8126660982..ee3a6fd74d 100644 --- a/Userland/Libraries/LibJS/Runtime/Value.h +++ b/Userland/Libraries/LibJS/Runtime/Value.h @@ -246,6 +246,7 @@ public: u64 encoded() const { return m_value.encoded; } String to_string(GlobalObject&, bool legacy_null_to_empty_string = false) const; + Vector<u16> to_utf16_string(GlobalObject&) const; PrimitiveString* to_primitive_string(GlobalObject&); Value to_primitive(GlobalObject&, PreferredType preferred_type = PreferredType::Default) const; Object* to_object(GlobalObject&) const; |