summaryrefslogtreecommitdiff
path: root/Userland/Libraries/LibRegex
AgeCommit message (Collapse)Author
2022-12-09Everywhere: Use C++ concepts instead of requires clausesMoustafa Raafat
2022-12-06Everywhere: Rename to_{string => deprecated_string}() where applicableLinus Groh
This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.
2022-12-06AK+Everywhere: Rename String to DeprecatedStringLinus Groh
We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)
2022-12-03Everywhere: Remove 'clang-format off' comments that are no longer neededLinus Groh
https://github.com/SerenityOS/serenity/pull/15654#issuecomment-1322554496
2022-12-03Everywhere: Run clang-formatLinus Groh
2022-11-17LibRegex: Use spans<4> to avoid allocating small vectorsAli Mohammad Pur
This path is hit a lot, and alloc/free of this vector was showing up on profiles, so get rid of it.
2022-11-17LibRegex: Use a copy-on-write vector for fork stateAli Mohammad Pur
2022-11-17LibRegex: Don't copy forked results twiceAli Mohammad Pur
2022-11-17LibRegex: Avoid copying MatchInput when getting argument descriptionsAli Mohammad Pur
2022-11-09LibRegex: Don't treat ForkReplace* as new forksAli Mohammad Pur
2022-11-06Everywhere: Remove redundant inequality comparison operatorsDaniel Bertalan
C++20 can automatically synthesize `operator!=` from `operator==`, so there is no point in writing such functions by hand if all they do is call through to `operator==`. This fixes a compile error with compilers that implement P2468 (Clang 16 currently). This paper restores the C++17 behavior that if both `T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators makes the rewriting possible again. See https://reviews.llvm.org/D134529#3853062
2022-11-01Everywhere: Mark dependencies of most targets as PRIVATETim Schumacher
Otherwise, we end up propagating those dependencies into targets that link against that library, which creates unnecessary link-time dependencies. Also included are changes to readd now missing dependencies to tools that actually need them.
2022-11-01Everywhere: Explicitly link all binaries against the LibC targetTim Schumacher
Even though the toolchain implicitly links against -lc, it does not know where it should get LibC from except for the sysroot. In the case of Clang this causes it to pick up the LibC stub instead, which might be slightly outdated and feature missing symbols. This is currently not an issue that manifests because we pass through the dependency on LibC and other libraries by accident, which causes CMake to link against the LibC target (instead of just the library), and thus points the linker at the build output directory. Since we are looking to fix that in the upcoming commits, let's make sure that everything will still be able to find the proper LibC first.
2022-10-16CMake+Userland: Use CMakeLists from Userland to build Lagom LibrariesAndrew Kaster
Also do this for Shell. This greatly simplifies the CMakeLists in Lagom, replacing many glob patterns with a big list of libraries. There are still a few special libraries that need some help to conform to the pattern, like LibELF and LibWebView. It also lets us remove essentially all of the Serenity or Lagom binary directory detection logic from code generators, as now both projects directories enter the generator logic from the same place.
2022-10-14AK+Toolchain: Make char and wchar_t behave on AARCH64Gunnar Beutner
By default char and wchar_t are unsigned on AARCH64. This fixes a bunch of related compiler errors.
2022-10-10Everywhere: Replace uses of __serenity__ with AK_OS_SERENITYAndrew Kaster
Now that we have OS macros for essentially every supported OS, let's try to use them everywhere.
2022-10-10LibRegex: Don't build LibRegex/C/Regex.cpp on LagomAndrew Kaster
This file implements the POSIX APIs from <regex.h>, and is not suitable for inclusion in a Lagom build. If we do include it, it will override the host's regex functions and wreak havoc if it's resolved before the host's implementation.
2022-09-20LibC+LibRegex: Move central regex definitions into LibC/bitsAli Mohammad Pur
This decouples LibRegex from the serenity LibC. Fixes #15251.
2022-09-17Everywhere: Fix badly-formatted includesBen Wiederhake
2022-09-16Everywhere: Remove a bunch of dead write-only variablesTim Schumacher
LLVM 15 now warns (and thus errors) about this, and there is really no point in keeping them.
2022-09-12LibRegex: Account for eof after \<x> when 'x' leads to legacy behaviourAli Mohammad Pur
2022-09-12LibRegex: Consume exactly two chars for escaped charactersAli Mohammad Pur
We were previously consuming an extra char afterwards, which could be the charclass terminator, leading to possible OOB accesses.
2022-08-29LibRegex: Explicitly check if a character falls into a table-based rangeTimothy Flynn
Previously, for a regex such as /[a-sy-z]/i, we would incorrectly think the character "u" fell into the range "a-s" because neither of the conditions "u > s && U > s" or "u < a && U < a" would be true, resulting in the lookup falling back to assuming the character is in the range. Instead, first explicitly check if the character falls into the range, rather than checking if it falls outside the range. If the explicit checks fail, then we know the character is outside the range.
2022-07-20LibRegex: Check code unit count range when accessing by code unit countAli Mohammad Pur
2022-07-20LibRegex: Partially implement the ECMAScript unicodeSets proposalAli Mohammad Pur
This skips the new string unicode properties additions, along with \q{}.
2022-07-20LibRegex: Refactor parsing 'CharacterEscape' out of 'AtomEscape'Ali Mohammad Pur
The ECMA262 spec has this as a separate production, and we need it to be split up for a future commit.
2022-07-20LibRegex: Pass parse flags as a struct instead of multiple argumentsAli Mohammad Pur
2022-07-12LibRegex: Remove RegexStringView(char const*) constructorsin-ack
This allowed passing in a nullptr for the StringView which will not be possible once StringView(char const*) is removed.
2022-07-12Everywhere: Use default StringView constructor over nullptrsin-ack
While null StringViews are just as bad, these prevent the removal of StringView(char const*) as that constructor accepts a nullptr. No functional changes.
2022-07-12Everywhere: Add sv suffix to strings relying on StringView(char const*)sin-ack
Each of these strings would previously rely on StringView's char const* constructor overload, which would call __builtin_strlen on the string. Since we now have operator ""sv, we can replace these with much simpler versions. This opens the door to being able to remove StringView(char const*). No functional changes.
2022-07-12Everywhere: Explicitly specify the size in StringView constructorssin-ack
This commit moves the length calculations out to be directly on the StringView users. This is an important step towards the goal of removing StringView(char const*), as it moves the responsibility of calculating the size of the string to the user of the StringView (which will prevent naive uses causing OOB access).
2022-07-10LibRegex: Treat inverted Compare entries as disjunctionsAli Mohammad Pur
[^XYZ] is not(X | Y | Z), we used to translate this to not(X) | not(Y) | not(Z), this commit makes LibRegex interpret this pattern as not(X) & not(Y) & not(Z).
2022-07-10LibRegex: Correctly track current inversion state in the optimizerAli Mohammad Pur
This is currently not important as we do not nest TemporaryInverse.
2022-07-10LibRegex: Flush compare tables before entering a permanent inverse stateAli Mohammad Pur
2022-07-09LibRegex: Fix lookup table-based range checks in CompareAli Mohammad Pur
The lowercase version of a range is not required to be a valid range, instead of casefolding the range and making it invalid, check twice with both cases of the input character (which are the same as the input if not insensitive). This time includes an actual test :^)
2022-07-05LibRegex: Use the correct values for comparing LUT entriesAli Mohammad Pur
Previously we were ignoring the insensitive flag for LUT lookups.
2022-07-05LibRegex: Use proper CharRange constructor instead of bit_castingAli Mohammad Pur
Otherwise the range order would be inverted.
2022-07-04LibRegex: Fully interpret the Compare Op when looking for overlapsAli Mohammad Pur
We had a really naive and simplistic implementation, which lead to various issues where the optimiser incorrectly rewrote the regex to use atomic groups; this commit fixes that.
2022-04-22LibRegex: Check inverse_matched after every op, not just at the endAli Mohammad Pur
Fixes #13755. Co-Authored-By: Damien Firmenich <fir.damien@gmail.com>
2022-04-01Everywhere: Run clang-formatIdan Horowitz
2022-02-20LibRegex: Make codegen+optimisation for alternatives much fasterAli Mohammad Pur
Just a little thinking outside the box, and we can now parse and optimise a million copies of "a|" chained together in just a second :^)
2022-02-20LibRegex: Make parse_disjunction() consume all disjunctions in one frameAli Mohammad Pur
This helps us not blow up when too many disjunctions are chained togther in the regex we're parsing. Fixes #12615.
2022-02-20LibRegex: Allow quantifiers after quantifiable assertionsAli Mohammad Pur
While quantifying assertions is very much meaningless, the specification allows them with annex B's extended grammar for browsers, so read and apply the quantifiers. Fixes #12373.
2022-02-14LibRegex: Correct the alternative matching order when one is emptyAli Mohammad Pur
Previously we were compiling `/a|/` into what effectively would be `/|a`, which is clearly incorrect.
2022-02-09LibRegex: Only skip full instructions when optimizing alternationsAli Mohammad Pur
It makes no sense to skip half of an instruction, so make sure to skip only full instructions!
2022-02-05LibRegex: Support non-ASCII whitespace characters when matching \s or \STimothy Flynn
ECMA-262 defines \s as: Return the CharSet containing all characters corresponding to a code point on the right-hand side of the WhiteSpace or LineTerminator productions. The LineTerminator production is simply: U+000A, U+000D, U+2028, or U+2029. Unfortunately there isn't a Unicode property that covers just those code points. The WhiteSpace production is: U+0009, U+000B, U+000C, U+FEFF, or any code point with the Space_Separator general category. If the Unicode generators are disabled, this will fall back to ASCII space code points.
2022-02-05LibRegex: Do not return an Optional from Regex::Matcher::executeTimothy Flynn
The code path that could return an optional no longer exists as of commit: a962ee020a6310b2d7c7479aa058c15484127418
2022-02-05LibRegex: Do not continue searching input when the sticky bit is setTimothy Flynn
This partially reverts commit a962ee020a6310b2d7c7479aa058c15484127418. When the sticky bit is set, the global bit should basically be ignored except by external callers who want their own special behavior. For example, RegExp.prototype [ @@match ] will use the global flag to accumulate consecutive matches. But on the first failure, the regex loop should break.
2022-02-05LibJS+LibRegex: Don't repeat regex match in regexp_exec()Ali Mohammad Pur
LibRegex already implements this loop in a more performant way, so all LibJS has to do here is to return things in the right shape, and not loop over the input string. Previously this was a quadratic operation on string length, which lead to crazy execution times on failing regexps - now it's nice and fast :^) Note that a Regex test has to be updated to remove the stateful flag as it repeats matching on multiple strings.
2022-02-05LibRegex+LibJS: Avoid searching for more than one match in JS RegExpsAli Mohammad Pur
All of JS's regular expression APIs only want a single match, so avoid trying to produce more (which will be discarded anyway).