Age | Commit message (Collapse) | Author |
|
Fixes #13755.
Co-Authored-By: Damien Firmenich <fir.damien@gmail.com>
|
|
|
|
Just a little thinking outside the box, and we can now parse and
optimise a million copies of "a|" chained together in just a second :^)
|
|
This helps us not blow up when too many disjunctions are chained togther
in the regex we're parsing.
Fixes #12615.
|
|
While quantifying assertions is very much meaningless, the specification
allows them with annex B's extended grammar for browsers, so read and
apply the quantifiers.
Fixes #12373.
|
|
Previously we were compiling `/a|/` into what effectively would be
`/|a`, which is clearly incorrect.
|
|
It makes no sense to skip half of an instruction, so make sure to skip
only full instructions!
|
|
ECMA-262 defines \s as:
Return the CharSet containing all characters corresponding to a code
point on the right-hand side of the WhiteSpace or LineTerminator
productions.
The LineTerminator production is simply: U+000A, U+000D, U+2028, or
U+2029. Unfortunately there isn't a Unicode property that covers just
those code points.
The WhiteSpace production is: U+0009, U+000B, U+000C, U+FEFF, or any
code point with the Space_Separator general category.
If the Unicode generators are disabled, this will fall back to ASCII
space code points.
|
|
The code path that could return an optional no longer exists as of
commit: a962ee020a6310b2d7c7479aa058c15484127418
|
|
This partially reverts commit a962ee020a6310b2d7c7479aa058c15484127418.
When the sticky bit is set, the global bit should basically be ignored
except by external callers who want their own special behavior. For
example, RegExp.prototype [ @@match ] will use the global flag to
accumulate consecutive matches. But on the first failure, the regex
loop should break.
|
|
LibRegex already implements this loop in a more performant way, so all
LibJS has to do here is to return things in the right shape, and not
loop over the input string.
Previously this was a quadratic operation on string length, which lead
to crazy execution times on failing regexps - now it's nice and fast :^)
Note that a Regex test has to be updated to remove the stateful flag as
it repeats matching on multiple strings.
|
|
All of JS's regular expression APIs only want a single match, so avoid
trying to produce more (which will be discarded anyway).
|
|
As ECMA262 regex allows `[^]` and literal newlines to match newlines in
the input string, we shouldn't split the input string into lines, rather
simply make boundaries and catchall patterns capable of checking for
these conditions specifically.
|
|
Instead, return a vector of one empty string.
|
|
This makes the (flawed) ForkStay inserted as a loop header unnecessary,
and finally fixes LibRegex rewriting weird loops in weird ways.
|
|
|
|
Instead of leaking all capture groups and selectively clearing some,
simply avoid leaking things and only "define" the ones that need to
exist.
This *actually* implements the capture groups ECMA262 quirk.
Also adds the test removed in the previous commit (to avoid messing up
test runs across bisects).
|
|
This partially reverts commit c11be92e23d899e28d45f67be24e47b2e5114d3a.
That commit fixes one thing and breaks many more, a next commit will
implement this quirk in a more sane way.
|
|
...only if Multiline is not enabled.
Fixes #11940.
|
|
This implements the quirk defined by "Note 3" in section "Canonicalize"
(https://tc39.es/ecma262/#sec-runtime-semantics-canonicalize-ch).
Crosses off another quirk from #6042.
|
|
Previously we were jumping to the new end of the previous block (created
by the newly inserted ForkStay), correct the offset to jump to the
correct block as shown in the comments.
Fixes #12033.
|
|
...and fix all the instances of visit() taking non-const arguments.
|
|
This makes negative lookarounds with more than one fork behave
correctly.
Fixes #11350.
|
|
|
|
Equivalent to std::ranges::any_of, which clang-tidy suggests.
|
|
This stops clang-tidy from suggesting that this function can be made
static, although accessing `this->operator==` in the lambda function.
|
|
Also replace String creation from `""` with `String::empty()`
|
|
|
|
|
|
|
|
...by flattening the underlying bytecode chunks first.
Also avoid calling DisjointChunks::size() inside a loop.
This is a very significant improvement in performance, making the
compilation of a large regex with lots of alternatives take only ~100ms
instead of many minutes (I ran out of patience waiting for it) :^)
|
|
That method is O(n), and should generally be avoided.
|
|
|
|
|
|
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes #11247.
Also adds the minimal test case(s) from that issue.
|
|
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes #10952.
|
|
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
|
|
Previously we were always choosing the "nothing special" code path, even
if the dollar symbol was at the end of the pattern (and therefore should
have been considered special).
Fix that by actually checking if the pattern end follows, and emitting
the correct instruction if necessary.
|
|
|
|
|
|
Doing so would cause patterns like `(a|)` to not match the empty string.
|
|
In particular, we implicitly required that the caller initializes the
returned instances themselves (solved by making
UniformBumpAllocator::allocate call the constructor), and BumpAllocator
itself cannot handle classes that are not trivially deconstructible
(solved by deleting the method).
Co-authored-by: Ali Mohammad Pur <ali.mpfard@gmail.com>
|
|
The implementation of transform_bytecode_repetition_min_max expects at
least min=1, so let's also optimise on our way out of a bug where
'x{0,}' would cause a crash.
|
|
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
|
|
Instead of making a new string to compare against, simply grab the first
code unit of the input and compare against that.
|
|
It's very common to encounter single-character strings in JavaScript on
the web. We can make such strings significantly lighter by having a
1-character inline capacity on the Vectors.
|
|
The time we were spending on these signposts was adding up to way too
much, so let's not do it automatically.
|
|
|
|
This avoids doing DisjointChunks traversal for every bytecode access,
significantly reducing startup time for large regular expressions.
|
|
|