Age | Commit message (Collapse) | Author |
|
|
|
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.
One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
|
|
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
|
|
https://github.com/SerenityOS/serenity/pull/15654#issuecomment-1322554496
|
|
|
|
This path is hit a lot, and alloc/free of this vector was showing up on
profiles, so get rid of it.
|
|
|
|
|
|
|
|
|
|
C++20 can automatically synthesize `operator!=` from `operator==`, so
there is no point in writing such functions by hand if all they do is
call through to `operator==`.
This fixes a compile error with compilers that implement P2468 (Clang
16 currently). This paper restores the C++17 behavior that if both
`T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be
rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators
makes the rewriting possible again.
See https://reviews.llvm.org/D134529#3853062
|
|
Otherwise, we end up propagating those dependencies into targets that
link against that library, which creates unnecessary link-time
dependencies.
Also included are changes to readd now missing dependencies to tools
that actually need them.
|
|
Even though the toolchain implicitly links against -lc, it does not know
where it should get LibC from except for the sysroot. In the case of
Clang this causes it to pick up the LibC stub instead, which might be
slightly outdated and feature missing symbols.
This is currently not an issue that manifests because we pass through
the dependency on LibC and other libraries by accident, which causes
CMake to link against the LibC target (instead of just the library),
and thus points the linker at the build output directory.
Since we are looking to fix that in the upcoming commits, let's make
sure that everything will still be able to find the proper LibC first.
|
|
Also do this for Shell.
This greatly simplifies the CMakeLists in Lagom, replacing many glob
patterns with a big list of libraries. There are still a few special
libraries that need some help to conform to the pattern, like LibELF and
LibWebView.
It also lets us remove essentially all of the Serenity or Lagom binary
directory detection logic from code generators, as now both projects
directories enter the generator logic from the same place.
|
|
By default char and wchar_t are unsigned on AARCH64. This fixes a
bunch of related compiler errors.
|
|
Now that we have OS macros for essentially every supported OS, let's try
to use them everywhere.
|
|
This file implements the POSIX APIs from <regex.h>, and is not suitable
for inclusion in a Lagom build. If we do include it, it will override
the host's regex functions and wreak havoc if it's resolved before the
host's implementation.
|
|
This decouples LibRegex from the serenity LibC.
Fixes #15251.
|
|
|
|
LLVM 15 now warns (and thus errors) about this, and there is really no
point in keeping them.
|
|
|
|
We were previously consuming an extra char afterwards, which could be
the charclass terminator, leading to possible OOB accesses.
|
|
Previously, for a regex such as /[a-sy-z]/i, we would incorrectly think
the character "u" fell into the range "a-s" because neither of the
conditions "u > s && U > s" or "u < a && U < a" would be true, resulting
in the lookup falling back to assuming the character is in the range.
Instead, first explicitly check if the character falls into the range,
rather than checking if it falls outside the range. If the explicit
checks fail, then we know the character is outside the range.
|
|
|
|
This skips the new string unicode properties additions, along with \q{}.
|
|
The ECMA262 spec has this as a separate production, and we need it to be
split up for a future commit.
|
|
|
|
This allowed passing in a nullptr for the StringView which will not be
possible once StringView(char const*) is removed.
|
|
While null StringViews are just as bad, these prevent the removal of
StringView(char const*) as that constructor accepts a nullptr.
No functional changes.
|
|
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
|
|
This commit moves the length calculations out to be directly on the
StringView users. This is an important step towards the goal of removing
StringView(char const*), as it moves the responsibility of calculating
the size of the string to the user of the StringView (which will prevent
naive uses causing OOB access).
|
|
[^XYZ] is not(X | Y | Z), we used to translate this to
not(X) | not(Y) | not(Z), this commit makes LibRegex interpret this
pattern as not(X) & not(Y) & not(Z).
|
|
This is currently not important as we do not nest TemporaryInverse.
|
|
|
|
The lowercase version of a range is not required to be a valid range,
instead of casefolding the range and making it invalid, check twice with
both cases of the input character (which are the same as the input if
not insensitive).
This time includes an actual test :^)
|
|
Previously we were ignoring the insensitive flag for LUT lookups.
|
|
Otherwise the range order would be inverted.
|
|
We had a really naive and simplistic implementation, which lead to
various issues where the optimiser incorrectly rewrote the regex to use
atomic groups; this commit fixes that.
|
|
Fixes #13755.
Co-Authored-By: Damien Firmenich <fir.damien@gmail.com>
|
|
|
|
Just a little thinking outside the box, and we can now parse and
optimise a million copies of "a|" chained together in just a second :^)
|
|
This helps us not blow up when too many disjunctions are chained togther
in the regex we're parsing.
Fixes #12615.
|
|
While quantifying assertions is very much meaningless, the specification
allows them with annex B's extended grammar for browsers, so read and
apply the quantifiers.
Fixes #12373.
|
|
Previously we were compiling `/a|/` into what effectively would be
`/|a`, which is clearly incorrect.
|
|
It makes no sense to skip half of an instruction, so make sure to skip
only full instructions!
|
|
ECMA-262 defines \s as:
Return the CharSet containing all characters corresponding to a code
point on the right-hand side of the WhiteSpace or LineTerminator
productions.
The LineTerminator production is simply: U+000A, U+000D, U+2028, or
U+2029. Unfortunately there isn't a Unicode property that covers just
those code points.
The WhiteSpace production is: U+0009, U+000B, U+000C, U+FEFF, or any
code point with the Space_Separator general category.
If the Unicode generators are disabled, this will fall back to ASCII
space code points.
|
|
The code path that could return an optional no longer exists as of
commit: a962ee020a6310b2d7c7479aa058c15484127418
|
|
This partially reverts commit a962ee020a6310b2d7c7479aa058c15484127418.
When the sticky bit is set, the global bit should basically be ignored
except by external callers who want their own special behavior. For
example, RegExp.prototype [ @@match ] will use the global flag to
accumulate consecutive matches. But on the first failure, the regex
loop should break.
|
|
LibRegex already implements this loop in a more performant way, so all
LibJS has to do here is to return things in the right shape, and not
loop over the input string.
Previously this was a quadratic operation on string length, which lead
to crazy execution times on failing regexps - now it's nice and fast :^)
Note that a Regex test has to be updated to remove the stateful flag as
it repeats matching on multiple strings.
|
|
All of JS's regular expression APIs only want a single match, so avoid
trying to produce more (which will be discarded anyway).
|