Age | Commit message (Collapse) | Author |
|
|
|
This is step 1 to removing `must_create()`.
|
|
We add this basic functionality to the Kernel so Userspace can request a
particular virtual memory mapping to be immutable. This will be useful
later on in the DynamicLoader code.
The annotation of a particular Kernel Region as immutable implies that
the following restrictions apply, so these features are prohibited:
- Changing the region's protection bits
- Unmapping the region
- Annotating the region with other virtual memory flags
- Applying further memory advises on the region
- Changing the region name
- Re-mapping the region
|
|
This was introduced in the QEMU commit 8504f12 and was causing the
kernel to fail to boot on the q35 machine.
Fixes #14952.
|
|
We were previously using a 32-bit unsigned integer for this, which
caused us to start truncating region sizes when multiplied with
`PAGE_SIZE` on hardware with a lot of memory.
|
|
|
|
|
|
|
|
Our implementation for Jails resembles much of how FreeBSD jails are
working - it's essentially only a matter of using a RefPtr in the
Process class to a Jail object. Then, when we iterate over all processes
in various cases, we could ensure if either the current process is in
jail and therefore should be restricted what is visible in terms of
PID isolation, and also to be able to expose metadata about Jails in
/sys/kernel/jails node (which does not reveal anything to a process
which is in jail).
A lifetime model for the Jail object is currently plain simple - there's
simpy no way to manually delete a Jail object once it was created. Such
feature should be carefully designed to allow safe destruction of a Jail
without the possibility of releasing a process which is in Jail from the
actual jail. Each process which is attached into a Jail cannot leave it
until the end of a Process (i.e. when finalizing a Process). All jails
are kept being referenced in the JailManagement. When a last attached
process is finalized, the Jail is automatically destroyed.
|
|
|
|
The code in this file is not architecture specific, so it can be moved
to the base Kernel directory.
|
|
The MemoryManager uses this pointer to adds its newly created page
tables to the kernel page directory.
|
|
For the initial page tables we only need to identity map the kernel
image, the rest of the memory will be managed by the MemoryManager. The
linker script is updated to get the kernel image start and end
addresses.
|
|
This memory is only reserved on x86(-64) and is usable on other
architectures.
|
|
We no longer require to lock the m_inode_lock in the SharedInodeVMObject
code as the methods write_bytes and read_bytes of the Inode class do
this for us now.
|
|
According to Dr. POSIX, we should allow to call mmap on inodes even on
ranges that currently don't map to any actual data. Trying to read or
write to those ranges should result in SIGBUS being sent to the thread
that did violating memory access.
To implement this restriction, we simply check if the result of
read_bytes on an Inode returns 0, which means we have nothing valid to
map to the program, hence it should receive a SIGBUS in that case.
|
|
This reverts commit 0c675192c91999cc2b60cbbea708dbf7b5637897.
|
|
This class is intended to replace all IOAddress usages in the Kernel
codebase altogether. The idea is to ensure IO can be done in
arch-specific manner that is determined mostly in compile-time, but to
still be able to use most of the Kernel code in non-x86 builds. Specific
devices that rely on x86-specific IO instructions are already placed in
the Arch/x86 directory and are omitted for non-x86 builds.
The reason this works so well is the fact that x86 IO space acts in a
similar fashion to the traditional memory space being available in most
CPU architectures - the x86 IO space is essentially just an array of
bytes like the physical memory address space, but requires x86 IO
instructions to load and store data. Therefore, many devices allow host
software to interact with the hardware registers in both ways, with a
noticeable trend even in the modern x86 hardware to move away from the
old x86 IO space to exclusively using memory-mapped IO.
Therefore, the IOWindow class encapsulates both methods for x86 builds.
The idea is to allow PCI devices to be used in either way in x86 builds,
so when trying to map an IOWindow on a PCI BAR, the Kernel will try to
find the proper method being declared with the PCI BAR flags.
For old PCI hardware on non-x86 builds this might turn into a problem as
we can't use port mapped IO, so the Kernel will gracefully fail with
ENOTSUP error code if that's the case, as there's really nothing we can
do within such case.
For general IO, the read{8,16,32} and write{8,16,32} methods are
available as a convenient API for other places in the Kernel. There are
simply no direct 64-bit IO API methods yet, as it's not needed right now
and is not considered to be Arch-agnostic too - the x86 IO space doesn't
support generating 64 bit cycle on IO bus and instead requires two 2
32-bit accesses. If for whatever reason it appears to be necessary to do
IO in such manner, it could probably be added with some neat tricks to
do so. It is recommended to use Memory::TypedMapping struct if direct 64
bit IO is actually needed.
|
|
This will be used later on to allocate such structure on the heap when
it is necessary to do so.
|
|
The RTC and CMOS are currently only supported for x86 platforms and use
specific x86 instructions to produce only certain x86 plaform operations
and results, therefore, we move them to the Arch/x86 specific directory.
|
|
According to Dr. POSIX, we should allow to call mmap on inodes even on
ranges that currently don't map to any actual data. Trying to read or
write to those ranges should result in SIGBUS being sent to the thread
that did violating memory access.
|
|
|
|
FIXME: There's still a lot to do like for example, port `quickmap_page`.
This does however get us further into the boot process than before.
|
|
Otherwise we would be holding the MM global data lock and the Process
address space locks in reversed order to the rest of the system, which
can lead to deadlocks.
|
|
This commit updates the lock function from Spinlock and
RecursiveSpinlock to return the InterruptsState of the processor,
instead of the processor flags. The unlock functions would only look at
the interrupt flag of the processor flags, so we now use the
InterruptsState enum to clarify the intent, and such that we can use the
same Spinlock code for the aarch64 build.
To not break the build, all the call sites are updated aswell.
|
|
Globally shared MemoryManager state is now kept in a GlobalData struct
and wrapped in SpinlockProtected.
A small set of members are left outside the GlobalData struct as they
are only set during boot initialization, and then remain constant.
This allows us to access those members without taking any locks.
|
|
I believe this to be safe, as the main thing that LockRefPtr provides
over RefPtr is safe copying from a shared LockRefPtr instance. I've
inspected the uses of RefPtr<PhysicalPage> and it seems they're all
guarded by external locking. Some of it is less obvious, but this is
an area where we're making continuous headway.
|
|
When incrementing a reference count, it should be sufficient to use
relaxed ordering. Note that unref() still uses acquire-release.
|
|
We don't need the MM lock to unregister a PageDirectory from the CR3
map. This is already protected by the CR3 map's own lock.
|
|
We have to hold the region tree lock while dumping its regions anyway,
and taking the MM lock here was unnecessary.
|
|
We're not accessing any of the MM members here. Also remove some
redundant code to update CR3, since it calls activate_page_directory()
which does exactly the same thing.
|
|
|
|
Now that AddressSpace itself is always SpinlockProtected, we don't
need to also wrap the RegionTree. Whoever has the AddressSpace locked
is free to poke around its tree.
|
|
This allows sys$mprotect() to honor the original readable & writable
flags of the open file description as they were at the point we did the
original sys$mmap().
IIUC, this is what Dr. POSIX wants us to do:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mprotect.html
Also, remove the bogus and racy "W^X" checking we did against mappings
based on their current inode metadata. If we want to do this, we can do
it properly. For now, it was not only racy, but also did blocking I/O
while holding a spinlock.
|
|
This forces anyone who wants to look into and/or manipulate an address
space to lock it. And this replaces the previous, more flimsy, manual
spinlock use.
Note that pointers *into* the address space are not safe to use after
you unlock the space. We've got many issues like this, and we'll have
to track those down as wlel.
|
|
We were holding the MM lock across all of the region unmapping code.
This was previously necessary since the quickmaps used during unmapping
required holding the MM lock.
Now that it's no longer necessary, we can leave the MM lock alone here.
|
|
This makes locking them much more straightforward, and we can remove
a bunch of confusing use of AddressSpace::m_lock. That lock will also
be converted to use of SpinlockProtected in a subsequent patch.
|
|
You're still required to disable interrupts though, as the mappings are
per-CPU. This exposed the fact that our CR3 lookup map is insufficiently
protected (but we'll address that in a separate commit.)
|
|
This is no longer required as these quickmaps are now per-CPU. :^)
|
|
While the "regular" quickmap (used to temporarily map a physical page
at a known address for quick access) has been per-CPU for a while,
we also have the PD (page directory) and PT (page table) quickmaps
used by the memory management code to edit page tables. These have been
global, which meant that SMP systems had to keep fighting over them.
This patch makes *all* quickmaps per-CPU. We reserve virtual addresses
for up to 64 CPUs worth of quickmaps for now.
Note that all quickmaps are still protected by the MM lock, and we'll
have to fix that too, before seeing any real throughput improvements.
|
|
Until now, our kernel has reimplemented a number of AK classes to
provide automatic internal locking:
- RefPtr
- NonnullRefPtr
- WeakPtr
- Weakable
This patch renames the Kernel classes so that they can coexist with
the original AK classes:
- RefPtr => LockRefPtr
- NonnullRefPtr => NonnullLockRefPtr
- WeakPtr => LockWeakPtr
- Weakable => LockWeakable
The goal here is to eventually get rid of the Lock* classes in favor of
using external locking.
|
|
Instead of having two separate implementations of AK::RefCounted, one
for userspace and one for kernelspace, there is now RefCounted and
AtomicRefCounted.
|
|
All users which relied on the default constructor use a None lock rank
for now. This will make it easier to in the future remove LockRank and
actually annotate the ranks by searching for None.
|
|
|
|
We only need to hold the VMObject lock while inspecting and/or updating
the physical page array in the VMObject.
|
|
As soon as we've saved CR2 (the faulting address), we can re-enable
interrupt processing. This should make the kernel more responsive under
heavy fault loads.
|
|
Region::physical_page() now takes the VMObject lock while accessing the
physical pages array, and returns a RefPtr<PhysicalPage>. This ensures
that the array access is safe.
Region::physical_page_slot() now VERIFY()'s that the VMObject lock is
held by the caller. Since we're returning a reference to the physical
page slot in the VMObject's physical page array, this is the best we
can do here.
|
|
We really only need the VMObject lock when accessing the physical pages
array, so once we have a strong pointer to the physical page we want to
remap, we can give up the VMObject lock.
This fixes a deadlock I encountered while building DOOM on SMP.
|
|
|
|
|