Age | Commit message (Collapse) | Author |
|
This directory isn't just about virtual memory, it's about all kinds
of memory management.
|
|
VMObject already has an IntrusiveList of all the Regions that map it.
We were keeping a counter in addition to this, and only using it in
a single place to avoid iterating over the list in case it only had
1 entry.
Simplify VMObject by removing this counter and always iterating the
list even if there's only 1 entry. :^)
|
|
This was previously used for a single debug logging statement during
memory purging. There are no remaining users of this weak pointer,
so let's get rid of it.
|
|
This patch changes the semantics of purgeable memory.
- AnonymousVMObject now has a "purgeable" flag. It can only be set when
constructing the object. (Previously, all anonymous memory was
effectively purgeable.)
- AnonymousVMObject now has a "volatile" flag. It covers the entire
range of physical pages. (Previously, we tracked ranges of volatile
pages, effectively making it a page-level concept.)
- Non-volatile objects maintain a physical page reservation via the
committed pages mechanism, to ensure full coverage for page faults.
- When an object is made volatile, it relinquishes any unused committed
pages immediately. If later made non-volatile again, we then attempt
to make a new committed pages reservation. If this fails, we return
ENOMEM to userspace.
mmap() now creates purgeable objects if passed the MAP_PURGEABLE option
together with MAP_ANONYMOUS. anon_create() memory is always purgeable.
|
|
We're copying the inode contents from a stack buffer into a page that
we just quick-mapped, so there's no reason for this memcpy() to fail.
|
|
This patch greatly simplifies VMObject locking by doing two things:
1. Giving VMObject an IntrusiveList of all its mapping Region objects.
2. Removing VMObject::m_paging_lock in favor of VMObject::m_lock
Before (1), VMObject::for_each_region() was forced to acquire the
global MM lock (since it worked by walking MemoryManager's list of
all regions and checking for regions that pointed to itself.)
With each VMObject having its own list of Regions, VMObject's own
m_lock is all we need.
Before (2), page fault handlers used a separate mutex for preventing
overlapping work. This design required multiple temporary unlocks
and was generally extremely hard to reason about.
Instead, page fault handlers now use VMObject's own m_lock as well.
|
|
|
|
|
|
Since we're taking from the committed set of pages, there should never
be a reason for this call to fail.
Also add a Badge to disallow taking committed pages from anywhere but
the Region class.
|
|
|
|
This enables the Lock class to block a thread even while the thread is
working on a BlockCondition. A thread can still only be either blocked
by a Lock or a BlockCondition.
This also establishes a linked list of threads that are blocked by a
Lock and unblocking directly unlocks threads and wakes them directly.
|
|
And fix an unsafe dereference in SharedInodeVMObject::try_clone()
to make it OOM-safe.
|
|
|
|
Previously we would simply assume that Region allocation always
succeeded. There is still one such assumption when splitting user
regions inside a Space. That will be dealt with in a separate commit.
|
|
This commit converts naked `new`s to `AK::try_make` and `AK::try_create`
wherever possible. If the called constructor is private, this can not be
done, so we instead now use the standard-defined and compiler-agnostic
`new (nothrow)`.
|
|
|
|
- Make Region::create_kernel_only OOM safe.
- Make Region::create_user_accessible mostly OOM safe, there are still
some tendrils to untangle before it and be completely fixed.
|
|
Replace the AK::String used for Region::m_name with a KString.
This seems beneficial across the board, but as a specific data point,
it reduces time spent in sys$set_mmap_name() by ~50% on test-js. :^)
|
|
Because reading from the disk may preempt, we need to release the
paging lock.
|
|
The error handling in all these cases was still using the old style
negative values to indicate errors. We have a nicer solution for this
now with KResultOr<T>. This change switches the interface and then all
implementers to use the new style.
|
|
|
|
|
|
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
|
|
|
|
Increase type-safety moving the MemoryManager APIs which take a
Region::Access to actually use that type instead of a `u8`.
Eventually the actually m_access can be moved there as well, but
I hit some weird bug where it wasn't using the correct operators
in `set_access_bit(..)` even though it's declared (and tested).
Something to fix-up later.
|
|
Since we know for sure that the virtual memory regions in the new
process being created are not being used on any CPU, there's no need
to do TLB flushes for every mapped page.
|
|
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
|
|
If we try to align a number above 0xfffff000 to the next multiple of
the page size (4 KiB), it would wrap around to 0. This is most likely
never what we want, so let's assert if that happens.
|
|
If we somehow get tricked into mapping user-controlled mmap memory
at a kernel address, let's just panic the kernel.
|
|
We can use adopt_own(*new T) instead of make<T>().
|
|
Now that we no longer need to support the signal trampolines being
user-accessible inside the kernel memory range, we can get rid of the
"kernel" and "user-accessible" flags on Region and simply use the
address of the region to determine whether it's kernel or user.
This also tightens the page table mapping code, since it can now set
user-accessibility based solely on the virtual address of a page.
|
|
This should help us catch bogus VM ranges ending up in a process's
address space sooner.
|
|
Replacement made by `find Kernel Userland -name '*.h' -o -name '*.cpp' | sed -i -Ee 's/dbgln\b<(\w+)>\(/dbgln_if(\1, /g'`
|
|
This patch adds sys$msyscall() which is loosely based on an OpenBSD
mechanism for preventing syscalls from non-blessed memory regions.
It works similarly to pledge and unveil, you can call it as many
times as you like, and when you're finished, you call it with a null
pointer and it will stop accepting new regions from then on.
If a syscall later happens and doesn't originate from one of the
previously blessed regions, the kernel will simply crash the process.
|
|
This patch adds enforcement of two new rules:
- Memory that was previously writable cannot become executable
- Memory that was previously executable cannot become writable
Unfortunately we have to make an exception for text relocations in the
dynamic loader. Since those necessitate writing into a private copy
of library code, we allow programs to transition from RW to RX under
very specific conditions. See the implementation of sys$mprotect()'s
should_make_executable_exception_for_dynamic_loader() for details.
|
|
We need to make sure other processors can grab the MM lock while we
wait, so release it when we might block. Reading the page from
disk may also block, so release it during that time as well.
|
|
The following script was used to make these changes:
#!/bin/bash
set -e
tmp=$(mktemp -d)
echo "tmp=$tmp"
find Kernel \( -name '*.cpp' -o -name '*.h' \) | sort > $tmp/Kernel.files
find . \( -path ./Toolchain -prune -o -path ./Build -prune -o -path ./Kernel -prune \) -o \( -name '*.cpp' -o -name '*.h' \) -print | sort > $tmp/EverythingExceptKernel.files
cat $tmp/Kernel.files | xargs grep -Eho '[A-Z0-9_]+_DEBUG' | sort | uniq > $tmp/Kernel.macros
cat $tmp/EverythingExceptKernel.files | xargs grep -Eho '[A-Z0-9_]+_DEBUG' | sort | uniq > $tmp/EverythingExceptKernel.macros
comm -23 $tmp/Kernel.macros $tmp/EverythingExceptKernel.macros > $tmp/Kernel.unique
comm -1 $tmp/Kernel.macros $tmp/EverythingExceptKernel.macros > $tmp/EverythingExceptKernel.unique
cat $tmp/Kernel.unique | awk '{ print "#cmakedefine01 "$1 }' > $tmp/Kernel.header
cat $tmp/EverythingExceptKernel.unique | awk '{ print "#cmakedefine01 "$1 }' > $tmp/EverythingExceptKernel.header
for macro in $(cat $tmp/Kernel.unique)
do
cat $tmp/Kernel.files | xargs grep -l $macro >> $tmp/Kernel.new-includes ||:
done
cat $tmp/Kernel.new-includes | sort > $tmp/Kernel.new-includes.sorted
for macro in $(cat $tmp/EverythingExceptKernel.unique)
do
cat $tmp/Kernel.files | xargs grep -l $macro >> $tmp/Kernel.old-includes ||:
done
cat $tmp/Kernel.old-includes | sort > $tmp/Kernel.old-includes.sorted
comm -23 $tmp/Kernel.new-includes.sorted $tmp/Kernel.old-includes.sorted > $tmp/Kernel.includes.new
comm -13 $tmp/Kernel.new-includes.sorted $tmp/Kernel.old-includes.sorted > $tmp/Kernel.includes.old
comm -12 $tmp/Kernel.new-includes.sorted $tmp/Kernel.old-includes.sorted > $tmp/Kernel.includes.mixed
for file in $(cat $tmp/Kernel.includes.new)
do
sed -i -E 's/#include <AK\/Debug\.h>/#include <Kernel\/Debug\.h>/' $file
done
for file in $(cat $tmp/Kernel.includes.mixed)
do
echo "mixed include in $file, requires manual editing."
done
|
|
If we find ourselves with a user-accessible, non-shared Region backed by
a SharedInodeVMObject, that's pretty bad news, so let's just panic the
kernel instead of getting abused.
There might be a better place for this kind of check, so I've added a
FIXME about putting more thought into that.
|
|
This was exploitable since the shared flag determines whether inode
permission checks are applied in sys$mprotect().
The bug was pretty hard to spot due to default arguments being used
instead. This patch removes the default arguments to make explicit
at each call site what's being done.
|
|
This was done with the following script:
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/dbgln<debug_([a-z_]+)>/dbgln<\U\1_DEBUG>/' {} \;
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/if constexpr \(debug_([a-z0-9_]+)/if constexpr \(\U\1_DEBUG/' {} \;
|
|
This was done with the help of several scripts, I dump them here to
easily find them later:
awk '/#ifdef/ { print "#cmakedefine01 "$2 }' AK/Debug.h.in
for debug_macro in $(awk '/#ifdef/ { print $2 }' AK/Debug.h.in)
do
find . \( -name '*.cpp' -o -name '*.h' -o -name '*.in' \) -not -path './Toolchain/*' -not -path './Build/*' -exec sed -i -E 's/#ifdef '$debug_macro'/#if '$debug_macro'/' {} \;
done
# Remember to remove WRAPPER_GERNERATOR_DEBUG from the list.
awk '/#cmake/ { print "set("$2" ON)" }' AK/Debug.h.in
|
|
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.
|
|
This is no longer used. We can bring it back the day we need it.
|
|
This was too spammy to ever actually be used anyway.
|
|
|
|
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.
|
|
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.Everything:
|
|
These changes are arbitrarily divided into multiple commits to make it
easier to find potentially introduced bugs with git bisect.Everything:
The modifications in this commit were automatically made using the
following command:
find . -name '*.cpp' -exec sed -i -E 's/dbg\(\) << ("[^"{]*");/dbgln\(\1\);/' {} \;
|
|
If a TLB flush request is broadcast to other processors and the addresses
to flush are user mode addresses, we can ignore such a request on the
target processor if the page directory currently in use doesn't match
the addresses to be flushed. We still need to broadcast to all processors
in that case because the other processors may switch to that same page
directory at any time.
|
|
If we remap pages (e.g. lazy allocation) inside a VMObject that is
shared among more than one region, broadcast it to any other region
that may be mapping the same page.
|