Age | Commit message (Collapse) | Author |
|
This makes RegExpObject compile and store a Regex<ECMA262>, adds
all flag-related properties, and implements `RegExpPrototype.test()`
(complete with 'lastIndex' support) :^)
It should be noted that this only implements `test()' using the builtin
`exec()'.
|
|
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
|
|
The implementation uses atomics and futexes (yay!) and is heavily based on the
implementation I did for my learning project named "Let's write synchronization
primitives" [0].
That project, in fact, started when I tried to implement pthread_once() for
Serenity (because it was needed for another project of mine, stay tuned ;) ) and
was not very sure I got every case right. So now, after learning some more about
code patterns around atomics and futexes, I am reasonably sure, and it's time to
contribute the implementation of pthread_once() to Serenity :^)
[0] To be published at https://github.com/bugaevc/lets-write-sync-primitives
|
|
This way, if we end up deallocating an entire ChunkedBlock, UE doesn't
get confused thinking the freed pointer has never been allocated.
|
|
|
|
This allows UE to see what the heap metadata looks like.
|
|
When we hit the last token, make the saved pointer point to the null
terminator instead of to the next token. This ensures that the next
call to strtok_r() returns null as expected.
Found by running GCC in UE. :^)
|
|
Calling strerror() with a negative number should not access below the
error string array.
Found by running GCC in UE. :^)
|
|
The pointers returned by malloc should always be 8-byte aligned on x86.
We were not consistent about this, as some ChunkedBlock size classes
were not divisible by 8.
This fixes some OOB reads found by running GCC in UE.
|
|
Most systems (Linux, OpenBSD) adjust 0.5 ms per second, or 0.5 us per
1 ms tick. That is, the clock is sped up or slowed down by at most
0.05%. This means adjusting the clock by 1 s takes 2000 s, and the
clock an be adjusted by at most 1.8 s per hour.
FreeBSD adjusts 5 ms per second if the remaining time adjustment is
>= 1 s (0.5%) , else it adjusts by 0.5 ms as well. This allows adjusting
by (almost) 18 s per hour.
Since Serenity OS can lose more than 22 s per hour (#3429), this
picks an adjustment rate up to 1% for now. This allows us to
adjust up to 36s per hour, which should be sufficient to adjust
the clock fast enough to keep up with how much time the clock
currently loses. Once we have a fancier NTP implementation that can
adjust tick rate in addition to offset, we can think about reducing
this.
adjtime is a bit old-school and most current POSIX-y OSs instead
implement adjtimex/ntp_adjtime, but a) we have to start somewhere
b) ntp_adjtime() is a fairly gnarly API. OpenBSD's adjfreq looks
like it might provide similar functionality with a nicer API. But
before worrying about all this, it's probably a good idea to get
to a place where the kernel APIs are (barely) good enough so that
we can write an ntp service, and once we have that we should write
a way to automatically evaluate how well it keeps the time adjusted,
and only then should we add improvements ot the adjustment mechanism.
|
|
|
|
When a mallocation is shrunk/grown without moving, UE needs to update
its precise metadata about the mallocation, since it tracks *exactly*
how many bytes were allocated, not just the malloc chunk size.
|
|
|
|
If you try to do this (e.g "mv directory directory"), sys$rename() will
now fail with EDIRINTOSELF.
Dr. POSIX says we should return EINVAL for this, but a custom error
code allows us to print a much more helpful error message when this
problem occurs. :^)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The previous define led to issues when compiling some ports, namely zsh
5.8.
|
|
Thanks to @alimpfard for noticing this.
|
|
|
|
|
|
|
|
A "fallthrough comment" or `[[fallthrough]]` is only required after
non-empty `case`s.
|
|
Mostly in comments, but sprintf() now prints "August" instead of
"Auguest" so that's something.
|
|
And include string.h in the files that actually needed it
|
|
|
|
These are supposed to be both functions and macros. I'm not sure
how to provide the functions at the same time as the macros,
as they collide with each other and cause compile errors.
However, some ports fully expect them to be functions.
For example, OpenSSH stores them into structures as function
pointers.
|
|
I'm not exactly sure if this is in the right spot in the structure,
especially since I read that this is supposed to be cast into other
structures.
|
|
macros
The user/dead process macros are not used anywhere in Serenity right now,
but are required for OpenSSH.
|
|
|
|
|
|
|
|
Not used anywhere in Serenity right now, but required for OpenSSH.
|
|
|
|
Required for __BEGIN_DECLS
|
|
Not used anywhere in Serenity currently, but required for OpenSSH.
|
|
|
|
Why break at LibHTTP? Because "Meta+Libraries" would be insanely large,
and breaking between LibHTTP and LibJS makes the commits roughly evenly large.
|
|
When SO_TIMESTAMP is set as an option on a SOCK_DGRAM socket, then
recvmsg() will return a SCM_TIMESTAMP control message that
contains a struct timeval with the system time that was current
when the socket was received.
|
|
I want to add another entry to this list and don't want to
have to think of a number for it.
|
|
The implementation only supports a single iovec for now.
Some might say having more than one iovec is the main point of
recvmsg() and sendmsg(), but I'm interested in the control message
bits.
|
|
Since the CPU already does almost all necessary validation steps
for us, we don't really need to attempt to do this. Doing it
ourselves doesn't really work very reliably, because we'd have to
account for other processors modifying virtual memory, and we'd
have to account for e.g. pages not being able to be allocated
due to insufficient resources.
So change the copy_to/from_user (and associated helper functions)
to use the new safe_memcpy, which will return whether it succeeded
or not. The only manual validation step needed (which the CPU
can't perform for us) is making sure the pointers provided by user
mode aren't pointing to kernel mappings.
To make it easier to read/write from/to either kernel or user mode
data add the UserOrKernelBuffer helper class, which will internally
either use copy_from/to_user or directly memcpy, or pass the data
through directly using a temporary buffer on the stack.
Last but not least we need to keep syscall params trivial as we
need to copy them from/to user mode using copy_from/to_user.
|
|
This might make sleep() faster by up to pi nanoseconds, or less.
It's more about avoiding a senseless write than about optimization.
|
|
str{,n}casecmp is supposed to be *only* declared by strings.h, note the
trailing 's' in the filename.
We don't have any implementation for strlcat; using any strcat variants is
a bad idea anyway, given our implementation of AK::String.
TODO: Find a way to lint for declared-but-nowhere-defined functions.
|
|
This signal is ignored by default, but can be caught to implement state
reporting a la BSD. :^)
|
|
|