Age | Commit message (Collapse) | Author |
|
Previously, the heap expansion logic could end up calling kmalloc
recursively, which was quite messy and hard to reason about.
This patch redesigns heap expansion so that it's kmalloc-free:
- We make a single large virtual range allocation at startup
- When expanding, we bump allocate VM from that region
- When expanding, we populate page tables directly ourselves,
instead of going via MemoryManager.
This makes heap expansion a great deal simpler. However, do note that it
introduces two new flaws that we'll need to deal with eventually:
- The single virtual range allocation is limited to 64 MiB and once
exhausted, kmalloc() will fail. (Actually, it will PANIC for now..)
- The kmalloc heap can no longer shrink once expanded. Subheaps stay
in place once constructed.
|
|
The function to protect ksyms after initialization, is only used during
boot of the system, so it can be UNMAP_AFTER_INIT as well.
This requires we switch the order of the init sequence, so we now call
`MM.protect_ksyms_after_init()` before `MM.unmap_text_after_init()`.
|
|
As a small cleanup, this also makes `page_round_up` verify its
precondition with `page_round_up_would_wrap` (which callers are expected
to call), rather than having its own logic.
Fixes #11297.
|
|
This error only ever gets propagated to the userspace if
MAP_FIXED_NOREPLACE is requested, as MAP_FIXED unmaps intersecting
ranges beforehand, and non-fixed mmap() calls will just fall back to
allocating anywhere.
Linux specifies MAP_FIXED_NOREPLACE to return EEXIST when it can't
allocate, we now match that behavior.
|
|
Found by PVS Studio Static Analysis.
|
|
Now that the shared bottom 2 MiB virtual address mappings are gone
userspace can use lower virtual addresses.
|
|
Now that the last 2 users of these mappings (the Prekernel and the APIC
ap boot environment) were removed, these are no longer used.
|
|
The Prekernel's memory is only accessed until MemoryManager has been
initialized. Keeping them around afterwards is both unnecessary and bad,
as it prevents the userland from using the 0x100000-0x155000 virtual
address range.
Co-authored-by: Idan Horowitz <idan.horowitz@gmail.com>
|
|
In order to reduce our reliance on __builtin_{ffs, clz, ctz, popcount},
this commit removes all calls to these functions and replaces them with
the equivalent functions in AK/BuiltinWrappers.h.
|
|
We can leave the .ksyms section mapped-but-read-only and then have the
symbols index simply point into it.
Note that we manually insert null-terminators into the symbols section
while parsing it.
This gets rid of ~950 KiB of kmalloc_eternal() at startup. :^)
|
|
Since it's possible to determine where the small zones will start to
occur for each PhysicalRegion, we can use arithmetic so that the call
time for both large and small zones is identical.
|
|
This makes searching for not yet OOM safe interfaces a bit easier.
|
|
It's not enough to just find the largest-address-not-above the argument,
we must also check that the found region actually contains the argument.
Regressed in a23edd42b869a16e11f4d6ca9071d6b570dc219c, thanks to Idan
for pointing this out.
|
|
Most of the time, we will be freeing physical pages within the
full-sized zones. We can do some simple math to find the right zone
immediately instead of looping through the zones, checking each one.
We still do loop through the slack/remainder zones at the end.
There's probably an even nicer way to solve this, but this is already a
nice improvement. :^)
|
|
We were already doing this for userspace memory regions (in the
Memory::AddressSpace class), so let's do it for kernel regions as well.
This gives a nice speed-up on test-js and probably basically everything
else as well. :^)
|
|
|
|
This matches the behaviour of the other *nixs and allows processes to
try and recover from such signals in userland.
|
|
This is required for compiling wine for serenity
|
|
SIGSTKFLT is a signal that signifies a stack fault in a x87 coprocessor,
this signal is not POSIX and also unused by Linux and the BSDs, so let's
use SIGSEGV so programs that setup signal handlers for the common
signals could still handle them in serenity.
|
|
This helper can (and will) be used in more parts of the kernel besides
the mmap-family of syscalls.
|
|
This helper method can be used to quickly and efficiently zero out a
region.
|
|
|
|
|
|
|
|
|
|
If an internal allocation failure occurs while setting up a new VRA,
we'll now propagate the error to our caller instead of panicking.
|
|
|
|
To make sure we don't lose changes, shared file mappings will now be
fully synced when they are unmapped, whether explicitly or implicitly
(by the program exiting/crashing/etc.)
This can incur a lot of work, since we don't keep track of dirty pages,
but that's something we can optimize down the road. :^)
|
|
This allows userspace to trigger a full (FIXME) flush of a shared file
mapping to disk. We iterate over all the mapped pages in the VMObject
and write them out to the underlying inode, one by one. This is rather
naive, and there's lots of room for improvement.
Note that shared file mappings are currently not possible since mmap()
returns ENOTSUP for PROT_WRITE+MAP_SHARED. That restriction will be
removed in a subsequent commit. :^)
|
|
This is a handy helper that copies out the full contents of a physical
page into a caller-provided buffer. It uses quickmapping internally
(and takes the MM lock for the duration.)
|
|
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
|
|
Seems we are declaring this guy as extern RecursiveSpinLock s_mm_lock;
in both Thread.h and MemoryManager.h. Smells funny for sure.
|
|
They were marked public, which seems like an obvious typo.
|
|
... In files included from Kernel/Process.cpp and Kernel/Thread.cpp
|
|
Instead of signalling allocation failure with a bool return value
(false), we now use ErrorOr<void> and return ENOMEM as appropriate.
This allows us to use TRY() and MUST() with Vector. :^)
|
|
|
|
We now use AK::Error and AK::ErrorOr<T> in both kernel and userspace!
This was a slightly tedious refactoring that took a long time, so it's
not unlikely that some bugs crept in.
Nevertheless, it does pass basic functionality testing, and it's just
real nice to finally see the same pattern in all contexts. :^)
|
|
We were taking and releasing the lock repeatedly instead of holding it
across the entire remap operation.
|
|
This small change simplifies the function a bit but also fixes a problem
with it.
Let's take an example to see this:
Let's say we have a reserved range between 0xe0000 to 0xfffff (EBDA),
then we want to map from the memory device (/dev/mem) the entire
EBDA to a program. If a program tries to map more than 131072 bytes,
the current logic will work - the start address is 0xe0000, and ofcourse
it's below the limit, hence it passes the first two restrictions.
Then, the third if statement will fail if we try to mmap more than
the said allowed bytes.
However, let's take another scenario, where we try to mmap from
0xf0000 - but we try to mmap less than 131072 - but more than 65536.
In such case, we again pass the first two if statements, but the third
one is passed two, because it doesn't take into account the offseted
address from the start of the reserved range (0xe0000). In such case,
a user can easily mmap 65535 bytes above 0x100000. This might
seem negligible. However, it's still a severe bug that can theoretically
be exploited into a info leak or tampering with important kernel
structures.
|
|
A new header file has been created in the Arch/ folder while the
implementation has been moved into a CPP living in the X86 folder.
|
|
Instead of iterating over the regions in the tree which is O(n), we can
just use RedBlackTree's find_largest_not_above method, which is O(logn)
|
|
SonarCloud flagged this "Code Smell", where we are accessing these
static methods as if they are instance methods. While it is technically
possible, it is very confusing to read when you realize they are static
functions.
|
|
|
|
When testing the RTL8168 driver, it seems we can't allocate super pages
anymore. Either we expand the super pages range, or find a solution to
dynamically expand the range (or let drivers utilize other ranges).
|
|
pvs-studio flagged this as a potential optimization.
|
|
This function was checking 1 byte after the provided range, which caused
it to reject valid userspace ranges that happened to end exactly at the
top of the user address space.
This fixes a long-standing issue with mysterious Optional errors in
Coredump::write_regions(). (It happened when trying to add a memory
region at the very top of the address space to a coredump.)
|
|
This makes the user-facing type only take the node member pointer, and
lets the compiler figure out the other needed types from that.
|
|
This makes the user-facing type only take the node member pointer, and
lets the compiler figure out the other needed types from that.
|
|
This ensures we don't allocate when intializing the PageDirectory.
|
|
This allows us to simplify a whole bunch of call sites with TRY(). :^)
|