summaryrefslogtreecommitdiff
path: root/Kernel/VM/Region.h
AgeCommit message (Collapse)Author
2021-01-01Kernel: Implement lazy committed page allocationTom
By designating a committed page pool we can guarantee to have physical pages available for lazy allocation in mappings. However, when forking we will overcommit. The assumption is that worst-case it's better for the fork to die due to insufficient physical memory on COW access than the parent that created the region. If a fork wants to ensure that all memory is available (trigger a commit) then it can use madvise. This also means that fork now can gracefully fail if we don't have enough physical pages available.
2021-01-01Kernel: Add MAP_NORESERVE support to mmapTom
Rather than lazily committing regions by default, we now commit the entire region unless MAP_NORESERVE is specified. This solves random crashes in low-memory situations where e.g. the malloc heap allocated memory, but using pages that haven't been used before triggers a crash when no more physical memory is available. Use this flag to create large regions without actually committing the backing memory. madvise() can be used to commit arbitrary areas of such regions after creating them.
2021-01-01Kernel: Memory purging improvementsTom
This adds the ability for a Region to define volatile/nonvolatile areas within mapped memory using madvise(). This also means that memory purging takes into account all views of the PurgeableVMObject and only purges memory that is not needed by all of them. When calling madvise() to change an area to nonvolatile memory, return whether memory from that area was purged. At that time also try to remap all memory that is requested to be nonvolatile, and if insufficient pages are available notify the caller of that fact.
2020-12-29Kernel+LibC: Add a very limited sys$mremap() implementationAndreas Kling
This syscall can currently only remap a shared file-backed mapping into a private file-backed mapping.
2020-09-25Meta+Kernel: Make clang-format-10 cleanBen Wiederhake
2020-09-13Kernel: Make copy_to/from_user safe and remove unnecessary checksTom
Since the CPU already does almost all necessary validation steps for us, we don't really need to attempt to do this. Doing it ourselves doesn't really work very reliably, because we'd have to account for other processors modifying virtual memory, and we'd have to account for e.g. pages not being able to be allocated due to insufficient resources. So change the copy_to/from_user (and associated helper functions) to use the new safe_memcpy, which will return whether it succeeded or not. The only manual validation step needed (which the CPU can't perform for us) is making sure the pointers provided by user mode aren't pointing to kernel mappings. To make it easier to read/write from/to either kernel or user mode data add the UserOrKernelBuffer helper class, which will internally either use copy_from/to_user or directly memcpy, or pass the data through directly using a temporary buffer on the stack. Last but not least we need to keep syscall params trivial as we need to copy them from/to user mode using copy_from/to_user.
2020-09-02Kernel: Handle committing pages in regions more gracefullyTom
Sometimes a physical underlying page may be there, but we may be unable to allocate a page table that may be needed to map it. Bubble up such mapping errors so that they can be handled more appropriately.
2020-07-30Kernel: Move syscall implementations out of Process.cppAndreas Kling
This is something I've been meaning to do for a long time, and here we finally go. This patch moves all sys$foo functions out of Process.cpp and into files in Kernel/Syscalls/. It's not exactly one syscall per file (although it could be, but I got a bit tired of the repetitive work here..) This makes hacking on individual syscalls a lot less painful since you don't have to rebuild nearly as much code every time. I'm also hopeful that this makes it easier to understand individual syscalls. :^)
2020-07-06Kernel: Aggregate TLB flush requests for Regions for SMPTom
Rather than sending one TLB flush request for each page, aggregate them so that we're not spamming the other processors with FlushTLB IPIs.
2020-07-01Kernel: List all CPUs in /proc/cpuinfoTom
2020-06-04Kernel: Add mechanism to identity map the lowest 2MBTom
2020-05-06Kernel: Crash the current process on OOM (instead of panicking kernel)Andreas Kling
This patch adds PageFaultResponse::OutOfMemory which informs the fault handler that we were unable to allocate a necessary physical page and cannot continue. In response to this, the kernel will crash the current process. Because we are OOM, we can't symbolicate the crash like we normally would (since the ELF symbolication code needs to allocate), so we also communicate to Process::crash() that we're out of memory. Now we can survive "allocate 300 MB" (only the allocate process dies.) This is definitely not perfect and can easily end up killing a random innocent other process who happened to allocate one page at the wrong time, but it's a *lot* better than panicking on OOM. :^)
2020-04-28Kernel: Add Region helpers for accessing underlying physical pagesAndreas Kling
Since a Region is basically a view into a potentially larger VMObject, it was always necessary to include the Region starting offset when accessing its underlying physical pages. Until now, you had to do that manually, but this patch adds a simple Region::physical_page() for read-only access and a physical_page_slot() when you want a mutable reference to the RefPtr<PhysicalPage> itself. A lot of code is simplified by making use of this.
2020-04-13ptrace: Add PT_POKEItamar
PT_POKE writes a single word to the tracee's address space. Some caveats: - If the user requests to write to an address in a read-only region, we temporarily change the page's protections to allow it. - If the user requests to write to a region that's backed by a SharedInodeVMObject, we replace the vmobject with a PrivateIndoeVMObject.
2020-04-12Kernel+LibC: Add minherit() and MAP_INHERIT_ZEROAndreas Kling
This patch adds the minherit() syscall originally invented by OpenBSD. Only the MAP_INHERIT_ZERO mode is supported for now. If set on an mmap region, that region will be zeroed out on fork().
2020-03-01Kernel: Remove some Region construction helpersAndreas Kling
It's now up to the caller to provide a VMObject when constructing a new Region object. This will make it easier to handle things going wrong, like allocation failures, etc.
2020-02-28Kernel: Remove some unnecessary indirection in InodeFile::mmap()Andreas Kling
InodeFile now directly calls Process::allocate_region_with_vmobject() instead of taking an awkward detour via a special Region constructor.
2020-02-24Kernel: Make Region weakable and use WeakPtr<Region> instead of Region*Andreas Kling
This turns use-after-free bugs into null pointer dereferences instead.
2020-02-19Kernel: Use bitfields in RegionAndreas Kling
This makes Region 4 bytes smaller and we can use bitfield initializers since they are allowed in C++20. :^)
2020-02-16Kernel: Reduce header dependencies of MemoryManager and RegionAndreas Kling
2020-02-16Kernel: Move all code into the Kernel namespaceAndreas Kling
2020-01-18Meta: Add license header to source filesAndreas Kling
As suggested by Joshua, this commit adds the 2-clause BSD license as a comment block to the top of every source file. For the first pass, I've just added myself for simplicity. I encourage everyone to add themselves as copyright holders of any file they've added or modified in some significant way. If I've added myself in error somewhere, feel free to replace it with the appropriate copyright holder instead. Going forward, all new source files should include a license header.
2020-01-14Kernel: Change Region allocation helpersLiav A
We now can create a cacheable Region, so when map() is called, if a Region is cacheable then all the virtual memory space being allocated to it will be marked as not cache disabled. In addition to that, OS components can create a Region that will be mapped to a specific physical address by using the appropriate helper method.
2020-01-10Kernel+LibELF: Enable SMAP protection during non-syscall exec()Andreas Kling
When loading a new executable, we now map the ELF image in kernel-only memory and parse it there. Then we use copy_to_user() when initializing writable regions with data from the executable. Note that the exec() syscall still disables SMAP protection and will require additional work. This patch only affects kernel-originated process spawns.
2020-01-01Kernel: Share code between Region::map() and Region::remap_page()Andreas Kling
These were doing mostly the same things, so let's just share the code.
2019-12-29Kernel+SystemMonitor: Expose amount of per-process dirty private memoryAndreas Kling
Dirty private memory is all memory in non-inode-backed mappings that's process-private, meaning it's not shared with any other process. This patch exposes that number via SystemMonitor, giving us an idea of how much memory each process is responsible for all on its own.
2019-12-25Kernel: Clean up Region access bit setters a littleAndreas Kling
2019-12-19Kernel: Rename vmo => vmobject everywhereAndreas Kling
2019-12-18Kernel: Add a specific-page variant of Region::commit()Andreas Kling
2019-12-15Kernel+SystemMonitor: Prevent userspace access to process ELF imageAndreas Kling
Every process keeps its own ELF executable mapped in memory in case we need to do symbol lookup (for backtraces, etc.) Until now, it was mapped in a way that made it accessible to the program, despite the program not having mapped it itself. I don't really see a need for userspace to have access to this right now, so let's lock things down a little bit. This patch makes it inaccessible to userspace and exposes that fact through /proc/PID/vm (per-region "user_accessible" flag.)
2019-12-15Kernel+SystemMonitor: Expose the number of set CoW bits in each RegionAndreas Kling
This number tells us how many more pages in a given region will trigger a CoW fault if written to.
2019-12-02Kernel: Crash on memory access in non-readable regionsAndreas Kling
This patch makes it possible to make memory regions non-readable. This is enforced using the "present" bit in the page tables. A process that hits an not-present page fault in a non-readable region will be crashed.
2019-12-02Kernel: Fix bug where mprotect() would ignore setting PROT_WRITEAndreas Kling
A typo in Region::set_writable() caused us to update the readable flag rather than the writable flag.
2019-11-24Kernel: Mark mmap()-created regions with a special bitAndreas Kling
Then only allow regions with that bit to be manipulated via munmap() and mprotect(). This prevents messing with non-mmap()ed regions in a process's address space (stacks, shared buffers, ...)
2019-11-17Kernel: Implement some basic stack pointer validationAndreas Kling
VM regions can now be marked as stack regions, which is then validated on syscall, and on page fault. If a thread is caught with its stack pointer pointing into anything that's *not* a Region with its stack bit set, we'll crash the whole process with SIGSTKFLT. Userspace must now allocate custom stacks by using mmap() with the new MAP_STACK flag. This mechanism was first introduced in OpenBSD, and now we have it too, yay! :^)
2019-11-04Kernel: Move page fault handling from MemoryManager to RegionAndreas Kling
After the page fault handler has found the region in which the fault occurred, do the rest of the work in the region itself. This patch also makes all fault types consistently crash the process if a new page is needed but we're all out of pages.
2019-11-04Kernel: Don't expose a region's page directory to the outside worldAndreas Kling
Now that region manages its own mapping/unmapping, there's no need for the outside world to be able to grab at its page directory.
2019-11-04Kernel: Remove Region API's for setting/unsetting the page directoryAndreas Kling
This is done implicitly by mapping or unmapping the region.
2019-11-04Kernel: Fix weird Region constructor that took nullable RefPtr<Inode>Andreas Kling
It's never valid to construct a Region with a null Inode pointer using this constructor, so just take a NonnullRefPtr<Inode> instead.
2019-11-03Kernel: Teach Region how to remap itselfAndreas Kling
Now remapping (i.e flushing kernel metadata to the CPU page tables) is done by simply calling Region::remap().
2019-11-03Kernel: Regions should be mapped into a PageDirectory, not a ProcessAndreas Kling
This patch changes the parameter to Region::map() to be a PageDirectory since that matches how we think about the memory model: Regions are views onto VMObjects, and are mapped into PageDirectories. Each Process has a PageDirectory. The kernel also has a PageDirectory.
2019-11-03Kernel: Move region map/unmap operations into the Region classAndreas Kling
The more Region can take care of itself, the better.
2019-11-03Kernel: Move page remapping into Region::remap_page(index)Andreas Kling
Let Region deal with this, instead of everyone calling MemoryManager.
2019-10-01Kernel: Defer creation of Region CoW bitmaps until they're neededAndreas Kling
Instead of allocating and populating a Copy-on-Write bitmap for each Region up front, wait until we actually clone the Region for sharing with another process. In most cases, we never need any CoW bits and we save ourselves a lot of kmalloc() memory and time.
2019-10-01Kernel: Fix munmap() bad splitting of already-split RegionsAndreas Kling
When splitting an Region that's already the result of an earlier split, we have to take the Region's offset-in-VMObject into account since it may be non-zero.
2019-09-27Kernel: Make Region single-owner instead of ref-countedAndreas Kling
This simplifies the ownership model and makes Region easier to reason about. Userspace Regions are now primarily kept by Process::m_regions. Kernel Regions are kept in various OwnPtr<Regions>'s. Regions now only ever get unmapped when they are destroyed.
2019-09-16Kernel: Add a simple slab allocator for small allocationsAndreas Kling
This is a freelist allocator with static size classes that works as a complement to the generic kmalloc(). It's a lot faster than kmalloc() since allocation just means popping from the freelist. It's also significantly more compact when there are a lot of objects smaller than the minimum kmalloc chunk size (32 bytes.) This patch enables it for the Region and PhysicalPage classes. In the PhysicalPage (8 bytes) case, it's a huge improvement since we no longer waste 75% of the storage allocated. There are also a number of ways this can be improved, so let's keep working on it going forward.
2019-09-06AK: Rename <AK/AKString.h> to <AK/String.h>Andreas Kling
This was a workaround to be able to build on case-insensitive file systems where it might get confused about <string.h> vs <String.h>. Let's just not support building that way, so String.h can have an objectively nicer name. :^)
2019-09-04Kernel: Rename "vmo" to "vmobject" everywhereAndreas Kling
2019-08-29Kernel: Add some convenient getters to RegionAndreas Kling
Add getters for the underlying Range, the access bits, and also add contains(Range) which just wraps m_range.contains().