summaryrefslogtreecommitdiff
path: root/Userland/Libraries/LibRegex
AgeCommit message (Collapse)Author
2023-03-25LibRegex: Make ^ and $ accept all `LineTerminator`s instead of just '\n'Ali Mohammad Pur
Also adds a couple tests.
2023-03-10Everywhere: Rename equals_ignoring_case => equals_ignoring_ascii_caseAndreas Kling
Let's make it clear that these functions deal with ASCII case only.
2023-03-06Everywhere: Remove NonnullOwnPtr.h includesAndreas Kling
2023-02-17LibRegex: Add to_string method for RegexStringViewFausto Tommasi
2023-02-15LibRegex: Bail out of atomic rewrite if a block doesn't contain comparesAli Mohammad Pur
If a block jumps before performing a compare, we'd need to recursively find the first of the jumped-to block. While this is doable, it's not really worth spending the time as most such cases won't actually qualify for atomic loop rewrite anyway. Fixes an invalid rewrite when `.+` is followed by an alternation, e.g. /.+(a|b|c)/.
2023-02-15LibRegex: Consider the inverse=true case when finding pattern overlapAli Mohammad Pur
Previously we were only checking for overlap when the range wasn't in inverse mode, which made us miss things like /[^x]x/; this patch makes it so we don't miss that.
2023-02-15LibRegex: Make '.' reject matching LF / LS / PS as per the ECMA262 specAli Mohammad Pur
Previously we allowed it to match those, but the ECMA262 spec disallows these (except in DotAll).
2023-01-27AK: Remove StringBuilder::build() in favor of to_deprecated_string()Linus Groh
Having an alias function that only wraps another one is silly, and keeping the more obvious name should flush out more uses of deprecated strings. No behavior change.
2023-01-27LibRegex: Remove declarations for non-existent methodsSam Atkins
2023-01-09AK+Everywhere: Rename FlyString to DeprecatedFlyStringTimothy Flynn
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so let's rename it to A) match the name of DeprecatedString, B) write a new FlyString class that is tied to String.
2023-01-09AK+Everywhere: Rename Utf16View::to_utf8 to to_deprecated_stringTimothy Flynn
A subsequent commit will add to_utf8 back to create an AK::String.
2023-01-08AK+Everywhere: Make UTF-16 to UTF-8 converter fallibleTimothy Flynn
This could fail to allocate the underlying storage needed to store the UTF-8 data. Propagate this error.
2023-01-08AK+Everywhere: Make UTF-8 and UTF-32 to UTF-16 converters fallibleTimothy Flynn
These could fail to allocate the underlying storage needed to store the UTF-16 data. Propagate these errors.
2023-01-08AK+LibJS+LibRegex: Define an alias for UTF-16 string data storageTimothy Flynn
Instead of writing out "Vector<u16, 1>" everywhere, let's have a name for it.
2023-01-06LibRegex: Prevent patterns from matching the empty string twiceEli Youngs
Previously, if a pattern matched the empty string (e.g. ".*"), it would match the string twice instead of once. Among other issues, this caused a Regex replacement to duplicate its expected output, since it would replace "both" empty matches.
2023-01-06LibRegex: Allow the SingleMatch flag to be used as a PosixFlagEli Youngs
2023-01-04LibRegex: Return StringView from get_error_string()Nico Weber
It just returns literals after all. Removes one use of DeprecatedString.
2023-01-04LibRegex: Tweak get_error() functionNico Weber
- Return StringView instead of DeprecatedString from function returning only literals - Remove redundant cast - Remove "inline" -- the function is defined in a cpp file, so there's no need for the linkage implications of `inline`. And compilers know to inline static functions with a single use without it. (Normally I'd remove the `static` instead, but this is in an `extern "C"` block, and it doesn't matter enough to end that block before the helper function and reopen it enough after)
2023-01-02Everywhere: Remove unused includes of AK/Format.hBen Wiederhake
These instances were detected by searching for files that include AK/Format.h, but don't match the regex: \\b(CheckedFormatString|critical_dmesgln|dbgln|dbgln_if|dmesgln|FormatBu ilder|__FormatIfSupported|FormatIfSupported|FormatParser|FormatString|Fo rmattable|Formatter|__format_value|HasFormatter|max_format_arguments|out |outln|set_debug_enabled|StandardFormatter|TypeErasedFormatParams|TypeEr asedParameter|VariadicFormatParams|v_critical_dmesgln|vdbgln|vdmesgln|vf ormat|vout|warn|warnln|warnln_if)\\b (Without the linebreaks.) This regex is pessimistic, so there might be more files that don't actually use any formatting functions. Observe that this revealed that Userland/Libraries/LibC/signal.cpp is missing an include. In theory, one might use LibCPP to detect things like this automatically, but let's do this one step after another.
2023-01-02Everywhere: Move AK/Debug.h include to using files or removeBen Wiederhake
2023-01-02Everywhere: Fix badly-formatted includesBen Wiederhake
In 7c5e30daaa615ad3a2ef55222423a747ac0a1227, the focus was "only" on Userland/Libraries/, whereas this commit cleans up the remaining headers in the repo, and any new badly-formatted include.
2022-12-09Everywhere: Use C++ concepts instead of requires clausesMoustafa Raafat
2022-12-06Everywhere: Rename to_{string => deprecated_string}() where applicableLinus Groh
This will make it easier to support both string types at the same time while we convert code, and tracking down remaining uses. One big exception is Value::to_string() in LibJS, where the name is dictated by the ToString AO.
2022-12-06AK+Everywhere: Rename String to DeprecatedStringLinus Groh
We have a new, improved string type coming up in AK (OOM aware, no null state), and while it's going to use UTF-8, the name UTF8String is a mouthful - so let's free up the String name by renaming the existing class. Making the old one have an annoying name will hopefully also help with quick adoption :^)
2022-12-03Everywhere: Remove 'clang-format off' comments that are no longer neededLinus Groh
https://github.com/SerenityOS/serenity/pull/15654#issuecomment-1322554496
2022-12-03Everywhere: Run clang-formatLinus Groh
2022-11-17LibRegex: Use spans<4> to avoid allocating small vectorsAli Mohammad Pur
This path is hit a lot, and alloc/free of this vector was showing up on profiles, so get rid of it.
2022-11-17LibRegex: Use a copy-on-write vector for fork stateAli Mohammad Pur
2022-11-17LibRegex: Don't copy forked results twiceAli Mohammad Pur
2022-11-17LibRegex: Avoid copying MatchInput when getting argument descriptionsAli Mohammad Pur
2022-11-09LibRegex: Don't treat ForkReplace* as new forksAli Mohammad Pur
2022-11-06Everywhere: Remove redundant inequality comparison operatorsDaniel Bertalan
C++20 can automatically synthesize `operator!=` from `operator==`, so there is no point in writing such functions by hand if all they do is call through to `operator==`. This fixes a compile error with compilers that implement P2468 (Clang 16 currently). This paper restores the C++17 behavior that if both `T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators makes the rewriting possible again. See https://reviews.llvm.org/D134529#3853062
2022-11-01Everywhere: Mark dependencies of most targets as PRIVATETim Schumacher
Otherwise, we end up propagating those dependencies into targets that link against that library, which creates unnecessary link-time dependencies. Also included are changes to readd now missing dependencies to tools that actually need them.
2022-11-01Everywhere: Explicitly link all binaries against the LibC targetTim Schumacher
Even though the toolchain implicitly links against -lc, it does not know where it should get LibC from except for the sysroot. In the case of Clang this causes it to pick up the LibC stub instead, which might be slightly outdated and feature missing symbols. This is currently not an issue that manifests because we pass through the dependency on LibC and other libraries by accident, which causes CMake to link against the LibC target (instead of just the library), and thus points the linker at the build output directory. Since we are looking to fix that in the upcoming commits, let's make sure that everything will still be able to find the proper LibC first.
2022-10-16CMake+Userland: Use CMakeLists from Userland to build Lagom LibrariesAndrew Kaster
Also do this for Shell. This greatly simplifies the CMakeLists in Lagom, replacing many glob patterns with a big list of libraries. There are still a few special libraries that need some help to conform to the pattern, like LibELF and LibWebView. It also lets us remove essentially all of the Serenity or Lagom binary directory detection logic from code generators, as now both projects directories enter the generator logic from the same place.
2022-10-14AK+Toolchain: Make char and wchar_t behave on AARCH64Gunnar Beutner
By default char and wchar_t are unsigned on AARCH64. This fixes a bunch of related compiler errors.
2022-10-10Everywhere: Replace uses of __serenity__ with AK_OS_SERENITYAndrew Kaster
Now that we have OS macros for essentially every supported OS, let's try to use them everywhere.
2022-10-10LibRegex: Don't build LibRegex/C/Regex.cpp on LagomAndrew Kaster
This file implements the POSIX APIs from <regex.h>, and is not suitable for inclusion in a Lagom build. If we do include it, it will override the host's regex functions and wreak havoc if it's resolved before the host's implementation.
2022-09-20LibC+LibRegex: Move central regex definitions into LibC/bitsAli Mohammad Pur
This decouples LibRegex from the serenity LibC. Fixes #15251.
2022-09-17Everywhere: Fix badly-formatted includesBen Wiederhake
2022-09-16Everywhere: Remove a bunch of dead write-only variablesTim Schumacher
LLVM 15 now warns (and thus errors) about this, and there is really no point in keeping them.
2022-09-12LibRegex: Account for eof after \<x> when 'x' leads to legacy behaviourAli Mohammad Pur
2022-09-12LibRegex: Consume exactly two chars for escaped charactersAli Mohammad Pur
We were previously consuming an extra char afterwards, which could be the charclass terminator, leading to possible OOB accesses.
2022-08-29LibRegex: Explicitly check if a character falls into a table-based rangeTimothy Flynn
Previously, for a regex such as /[a-sy-z]/i, we would incorrectly think the character "u" fell into the range "a-s" because neither of the conditions "u > s && U > s" or "u < a && U < a" would be true, resulting in the lookup falling back to assuming the character is in the range. Instead, first explicitly check if the character falls into the range, rather than checking if it falls outside the range. If the explicit checks fail, then we know the character is outside the range.
2022-07-20LibRegex: Check code unit count range when accessing by code unit countAli Mohammad Pur
2022-07-20LibRegex: Partially implement the ECMAScript unicodeSets proposalAli Mohammad Pur
This skips the new string unicode properties additions, along with \q{}.
2022-07-20LibRegex: Refactor parsing 'CharacterEscape' out of 'AtomEscape'Ali Mohammad Pur
The ECMA262 spec has this as a separate production, and we need it to be split up for a future commit.
2022-07-20LibRegex: Pass parse flags as a struct instead of multiple argumentsAli Mohammad Pur
2022-07-12LibRegex: Remove RegexStringView(char const*) constructorsin-ack
This allowed passing in a nullptr for the StringView which will not be possible once StringView(char const*) is removed.
2022-07-12Everywhere: Use default StringView constructor over nullptrsin-ack
While null StringViews are just as bad, these prevent the removal of StringView(char const*) as that constructor accepts a nullptr. No functional changes.