Age | Commit message (Collapse) | Author |
|
This skips the new string unicode properties additions, along with \q{}.
|
|
This is currently not important as we do not nest TemporaryInverse.
|
|
|
|
Otherwise the range order would be inverted.
|
|
We had a really naive and simplistic implementation, which lead to
various issues where the optimiser incorrectly rewrote the regex to use
atomic groups; this commit fixes that.
|
|
Just a little thinking outside the box, and we can now parse and
optimise a million copies of "a|" chained together in just a second :^)
|
|
Previously we were compiling `/a|/` into what effectively would be
`/|a`, which is clearly incorrect.
|
|
It makes no sense to skip half of an instruction, so make sure to skip
only full instructions!
|
|
This makes the (flawed) ForkStay inserted as a loop header unnecessary,
and finally fixes LibRegex rewriting weird loops in weird ways.
|
|
Previously we were jumping to the new end of the previous block (created
by the newly inserted ForkStay), correct the offset to jump to the
correct block as shown in the comments.
Fixes #12033.
|
|
|
|
...by flattening the underlying bytecode chunks first.
Also avoid calling DisjointChunks::size() inside a loop.
This is a very significant improvement in performance, making the
compilation of a large regex with lots of alternatives take only ~100ms
instead of many minutes (I ran out of patience waiting for it) :^)
|
|
The instructions can have dependencies (e.g. Repeat), so only unify
equal blocks instead of consecutive instructions.
Fixes #11247.
Also adds the minimal test case(s) from that issue.
|
|
The initial `ForkStay` is only needed if the looping block has a
following block, if there's no following block or the following block
does not attempt to match anything, we should not insert the ForkStay,
otherwise we would be rewriting `a+` as `a*` by allowing the 'end' to be
executed.
Fixes #10952.
|
|
Doing so would cause patterns like `(a|)` to not match the empty string.
|
|
Generate a sorted, compressed series of ranges in a match table for
character classes, and use a binary search to find the matches.
This is about a 3-4x speedup for character class match performance. :^)
|
|
This avoids doing DisjointChunks traversal for every bytecode access,
significantly reducing startup time for large regular expressions.
|
|
Otherwise the fork in patterns like `(1+)\1` would be (incorrectly)
optimized away.
|
|
Previously we would've copied the bytecode instead of moving the chunks
around, use the fancy new DisjointChunks<T> abstraction to make that
happen automagically.
This decreases vector copies and uses of memmove() by nearly 10x :^)
|
|
This currently tries to convert forking loops to atomic groups, and
unify the left side of alternations.
|