Age | Commit message (Collapse) | Author |
|
|
|
Add a kernel data segment and make the user code segment come after
the data segment. We need the GDT to be in a certain order to support
the syscall and sysret instruction pair.
|
|
Fixes #11402.
|
|
We've finally gotten kmalloc to a point where it feels decent enough
to drop this comment.
There's still a lot of room for improvement, and we'll continue working
on it.
|
|
Let's do some simple pointer arithmetic to verify that the address being
freed is at least within one of the two valid kmalloc VM ranges.
|
|
|
|
This was a premature optimization from the early days of SerenityOS.
The eternal heap was a simple bump pointer allocator over a static
byte array. My original idea was to avoid heap fragmentation and improve
data locality, but both ideas were rooted in cargo culting, not data.
We would reserve 4 MiB at boot and only ended up using ~256 KiB, wasting
the rest.
This patch replaces all kmalloc_eternal() usage by regular kmalloc().
|
|
Since a socket can be accessed by multiple threads concurrently, we need
to protect shared data behind the socket mutex.
There's very likely more places where we need to fix this, the purpose
of this patch is to fix a VERIFY() failure in getsockopt() seen on CI.
|
|
We were doing this dance in notify_watchers():
set_metadata_dirty(true);
set_metadata_dirty(false);
This was done in order to force out inode watcher events immediately.
Unfortunately, this was racy, as if SyncTask got scheduled at the wrong
moment, it would try to flush metadata for a clean inode. This then got
trapped by the VERIFY() statement in Inode::sync_all():
VERIFY(inode.is_metadata_dirty());
This patch fixes the issue by replacing notify_watchers() with lazy
metadata notifications like all other filesystems.
|
|
This is mandated by POSIX, it's fine that we don't actually implement
it, just as long as it's present during compilation. :^)
|
|
Much like the existing in6addr_any global and the IN6ADDR_ANY_INIT
macro, our LibC is also expected to export the in6addr_loopback global
and the IN6ADDR_LOOPBACK_INIT constant.
These were found by the stress-ng port.
|
|
We can now directly create formatted KStrings with KString::formatted.
:^)
|
|
|
|
|
|
|
|
|
|
We've moved to this pattern for the majority of usages of IntrusiveList
in the Kernel, might as well be consistent. :^)
|
|
This matches the behavior of the generic subheaps (and the old slab
allocator implementation.)
|
|
This is no longer useful since kmalloc() does automatic slab allocation
without any of the limitations of the old SlabAllocator. :^)
|
|
|
|
Objects that were previously allocated via slab_alloc()/slab_dealloc()
now go through kmalloc()/kfree_sized() instead.
|
|
This patch adds generic slab allocators to kmalloc. In this initial
version, the slab sizes are 16, 32, 64, 128, 256 and 512 bytes.
Slabheaps are backed by 64 KiB block-aligned blocks with freelists,
similar to what we do in LibC malloc and LibJS Heap.
|
|
We were not allowing alignments greater than PAGE_SIZE for some reason.
|
|
|
|
There are no more users of the C-style kfree() API in the kernel,
so let's get rid of it and enjoy the new world where we always know
how much memory we are freeing. :^)
|
|
This patch does two things:
- Combines kmalloc_aligned() and kmalloc_aligned_cxx(). Templatizing
the alignment parameter doesn't seem like a valuable enough
optimization to justify having two almost-identical implementations.
- Stores the real allocation size of an aligned allocation along with
the other alignment metadata, and uses it to call kfree_sized()
instead of kfree().
|
|
|
|
|
|
This class was misusing the outdate Lockable template and didn't take
advantage of the lock/resource separation mechanism fully anyway.
Since the underlying PRNG has its own SpinLock, and we already use that
for synchronization everywhere anyway, we can simply remove the Lockable
inheritance from this class.
|
|
Currently the APIC class is constructed irrespective of whether it
is used or not.
So, move APIC initialization from init to the InterruptManagement
class and construct the APIC class only when it is needed.
|
|
Since we allocate the subheap in the first page of the given storage
let's assert that the subheap can actually fit in a single page, to
prevent the possible future headache of trying to debug the cause of
random kernel memory corruption :^)
|
|
This allows kmalloc() to satisfy arbitrary allocation requests instead
of being limited to a static subheap expansion size.
|
|
This avoids getting caught with our pants down when heap expansion fails
due to missing page tables. It also avoids a circular dependency on
kmalloc() by way of HashMap::set() in MemoryManager::ensure_pte().
|
|
|
|
If the data passed to sys$write() is backed by a not-yet-paged-in inode
mapping, we could end up in a situation where we get a page fault when
trying to copy data from userspace.
If that page fault handler tried reading from an inode that someone else
had locked while waiting for the disk cache lock, we'd deadlock.
This patch fixes the issue by copying the userspace data into a local
buffer before acquiring the disk cache lock. This is not ideal since it
incurs an extra copy, and I'm sure we can think of a better solution
eventually.
This was a frequent cause of startup deadlocks on x86_64 for me. :^)
|
|
We never want to execute kmalloc memory.
|
|
|
|
Previously, the heap expansion logic could end up calling kmalloc
recursively, which was quite messy and hard to reason about.
This patch redesigns heap expansion so that it's kmalloc-free:
- We make a single large virtual range allocation at startup
- When expanding, we bump allocate VM from that region
- When expanding, we populate page tables directly ourselves,
instead of going via MemoryManager.
This makes heap expansion a great deal simpler. However, do note that it
introduces two new flaws that we'll need to deal with eventually:
- The single virtual range allocation is limited to 64 MiB and once
exhausted, kmalloc() will fail. (Actually, it will PANIC for now..)
- The kmalloc heap can no longer shrink once expanded. Subheaps stay
in place once constructed.
|
|
This was used to return a pre-locked UDPSocket in one place, but there
was really no need for that mechanism in the first place since the
caller ends up locking the socket anyway.
|
|
The function to protect ksyms after initialization, is only used during
boot of the system, so it can be UNMAP_AFTER_INIT as well.
This requires we switch the order of the init sequence, so we now call
`MM.protect_ksyms_after_init()` before `MM.unmap_text_after_init()`.
|
|
Noticed these boot only functions are not currently UNMAP_AFTER_INIT.
Lets fix that :^)
|
|
|
|
As a small cleanup, this also makes `page_round_up` verify its
precondition with `page_round_up_would_wrap` (which callers are expected
to call), rather than having its own logic.
Fixes #11297.
|
|
Most other syscalls pass address arguments as `void*` instead of
`uintptr_t`, so let's do that here too. Besides improving consistency,
this commit makes `strace` correctly pretty-print these arguments in
hex.
|
|
This feature was introduced in version 4.17 of the Linux kernel, and
while it's not specified by POSIX, I think it will be a nice addition to
our system.
MAP_FIXED_NOREPLACE provides a less error-prone alternative to
MAP_FIXED: while regular fixed mappings would cause any intersecting
ranges to be unmapped, MAP_FIXED_NOREPLACE returns EEXIST instead. This
ensures that we don't corrupt our process's address space if something
is already at the requested address.
Note that the more portable way to do this is to use regular
MAP_ANONYMOUS, and check afterwards whether the returned address matches
what we wanted. This, however, has a large performance impact on
programs like Wine which try to reserve large portions of the address
space at once, as the non-matching addresses have to be unmapped
separately.
|
|
This error only ever gets propagated to the userspace if
MAP_FIXED_NOREPLACE is requested, as MAP_FIXED unmaps intersecting
ranges beforehand, and non-fixed mmap() calls will just fall back to
allocating anywhere.
Linux specifies MAP_FIXED_NOREPLACE to return EEXIST when it can't
allocate, we now match that behavior.
|
|
This helps avoid confusion in general, and make constructors, methods
and code patterns much more clean and understandable.
|
|
Previously we were assigning to Process::m_space before actually
entering the new address space (assigning it to CR3.)
If a thread was preempted by the scheduler while destroying the old
address space, we'd then attempt to resume the thread with CR3 pointing
at a partially destroyed address space.
We could then crash immediately in write_cr3(), right after assigning
the new value to CR3. I am hopeful that this may have been the bug
haunting our CI for months. :^)
|
|
Instead, wait until we transition back to userspace. This stops
userspace from being able to suspend a thread indefinitely while it's
running in kernelspace (potentially holding some blocking mutex.)
|
|
Found by PVS Studio Static Analysis
|