summaryrefslogtreecommitdiff
path: root/Kernel/Syscalls
AgeCommit message (Collapse)Author
2023-02-04Kernel+SystemServer+Base: Introduce the RAMFS filesystemLiav A
This filesystem is based on the code of the long-lived TmpFS. It differs from that filesystem in one keypoint - its root inode doesn't have a sticky bit on it. Therefore, we mount it on /dev, to ensure only root can modify files on that directory. In addition to that, /tmp is mounted directly in the SystemServer main (start) code, so it's no longer specified in the fstab file. We ensure that /tmp has a sticky bit and has the value 0777 for root directory permissions, which is certainly a special case when using RAM-backed (and in general other) filesystems. Because of these 2 changes, it's no longer needed to maintain the TmpFS filesystem, hence it's removed (renamed to RAMFS), because the RAMFS represents the purpose of this filesystem in a much better way - it relies on being backed by RAM "storage", and therefore it's easy to conclude it's temporary and volatile, so its content is gone on either system shutdown or unmounting of the filesystem.
2023-02-03Kernel: Fix usermode verification in ptrace with PT_SETREGSItamar
When doing PT_SETREGS, we want to verify that the debugged thread is executing in usermode. b2f7ccf refactored things and flipped the relevant check around, which broke things that use PT_SETREGS (for example, stepping over breakpoints with sdb).
2023-01-27Kernel: Add Syscalls/execve.cpp to aarch64 buildTimon Kruiper
2023-01-27Kernel: Add ThreadRegisters::set_exec_state and use it in execve.cppTimon Kruiper
Using this abstraction it is possible to compile this file for aarch64.
2023-01-27Kernel: Use InterruptsState abstraction in execve.cppTimon Kruiper
This was using the x86_64 specific cpu_flags abstraction, which is not compatible with aarch64.
2023-01-27Kernel: Add Syscalls/fork.cpp to aarch64 buildTimon Kruiper
2023-01-27Kernel: Add Syscalls/mmap.cpp to aarch64 buildTimon Kruiper
2023-01-27Kernel: Make Syscalls/ptrace.cpp buildable for aarch64Timon Kruiper
2023-01-27Kernel: Move Memory/PageDirectory.{cpp,h} to arch-specific directoryTimon Kruiper
The handling of page tables is very architecture specific, so belongs in the Arch directory. Some parts were already architecture-specific, however this commit moves the rest of the PageDirectory class into the Arch directory. While we're here the aarch64/PageDirectory.{h,cpp} files are updated to be aarch64 specific, by renaming some members and removing x86_64 specific code.
2023-01-27Kernel: Factor our PreviousMode into RegisterState::previous_modeTimon Kruiper
Various places in the kernel were manually checking the cs register for x86_64, however to share this with aarch64 a function in RegisterState is added, and the call-sites are updated. While we're here the PreviousMode enum is renamed to ExecutionMode.
2023-01-21Kernel+Userland: Move LibC/sys/ioctl_numbers to Kernel/API/Ioctl.hAndrew Kaster
This header has always been fundamentally a Kernel API file. Move it where it belongs. Include it directly in Kernel files, and make Userland applications include it via sys/ioctl.h rather than directly.
2023-01-21Kernel+LibC: Move name length constants to Kernel/API from limits.hAndrew Kaster
Reduce inclusion of limits.h as much as possible at the same time. This does mean that kmalloc.h is now including Kernel/API/POSIX/limits.h instead of LibC/limits.h, but the scope could be limited a lot more. Basically every file in the kernel includes kmalloc.h, and needs the limits.h include for PAGE_SIZE.
2023-01-14Meta: Fix copyright header in Kernel/Syscalls/jail.cpp fileLiav A
I wrote that file in 2022, not Andreas in 2018.
2023-01-13Kernel: Require "stdio" pledge promise when calling get_root_session_idLiav A
2023-01-10Kernel+LibCore: Make %sid path parsing not take agesAndreas Kling
Before this patch, Core::SessionManagement::parse_path_with_sid() would figure out the root session ID by sifting through /sys/kernel/processes. That file can take quite a while to generate (sometimes up to 40ms on my machine, which is a problem on its own!) and with no caching, many of our programs were effectively doing this multiple times on startup when unveiling something in /tmp/session/%sid/ While we should find ways to make generating /sys/kernel/processes fast again, this patch addresses the specific problem by introducing a new syscall: sys$get_root_session_id(). This extracts the root session ID by looking directly at the process table and takes <1ms instead of 40ms. This cuts WebContent process startup time by ~100ms on my machine. :^)
2023-01-07Kernel: Mark Process::jail() method as constLiav A
We really don't want callers of this function to accidentally change the jail, or even worse - remove the Process from an attached jail. To ensure this never happens, we can just declare this method as const so nobody can mutate it this way.
2023-01-03Kernel: Allow sending `SIGCONT` to processes in the same groupyyny
Allow sending `SIGCONT` to processes that share the same `pgid`. This is allowed in Linux aswell. Also fixes a FIXME :^)
2023-01-03Kernel: Add `sid` and `pgid` to `Credentials`yyny
There are places in the kernel that would like to have access to `pgid` credentials in certain circumstances. I haven't found any use cases for `sid` yet, but `sid` and `pgid` are both changed with `sys$setpgid`, so it seemed sensical to add it. In Linux, `man 7 credentials` also mentions both the session id and process group id, so this isn't unprecedented.
2023-01-02Kernel: Turn lock ranks into template parameterskleines Filmröllchen
This step would ideally not have been necessary (increases amount of refactoring and templates necessary, which in turn increases build times), but it gives us a couple of nice properties: - SpinlockProtected inside Singleton (a very common combination) can now obtain any lock rank just via the template parameter. It was not previously possible to do this with SingletonInstanceCreator magic. - SpinlockProtected's lock rank is now mandatory; this is the majority of cases and allows us to see where we're still missing proper ranks. - The type already informs us what lock rank a lock has, which aids code readability and (possibly, if gdb cooperates) lock mismatch debugging. - The rank of a lock can no longer be dynamic, which is not something we wanted in the first place (or made use of). Locks randomly changing their rank sounds like a disaster waiting to happen. - In some places, we might be able to statically check that locks are taken in the right order (with the right lock rank checking implementation) as rank information is fully statically known. This refactoring even more exposes the fact that Mutex has no lock rank capabilites, which is not fixed here.
2022-12-30Kernel: Disallow executing SUID binaries if process is jailedLiav A
Check if the process we are currently running is in a jail, and if that is the case, fail early with the EPERM error code. Also, as Brian noted, we should also disallow attaching to a jail in case of already running within a setid executable, as this leaves the user with false thinking of being secure (because you can't exec new setid binaries), but the current program is still marked setid, which means that at the very least we gained permissions while we didn't expect it, so let's block it.
2022-12-29Kernel: Move ThreadRegisters into arch-specific directoryTimon Kruiper
These are architecture-specific anyway, so they belong in the Arch directory. This commit also adds ThreadRegisters::set_initial_state to factor out the logic in Thread.cpp.
2022-12-28Kernel: Reorganize Arch/x86 directory to Arch/x86_64 after i686 removalLiav A
No functional change.
2022-12-28Kernel: Remove i686 supportLiav A
2022-12-16Kernel/Memory: Add option to annotate region mapping as immutableLiav A
We add this basic functionality to the Kernel so Userspace can request a particular virtual memory mapping to be immutable. This will be useful later on in the DynamicLoader code. The annotation of a particular Kernel Region as immutable implies that the following restrictions apply, so these features are prohibited: - Changing the region's protection bits - Unmapping the region - Annotating the region with other virtual memory flags - Applying further memory advises on the region - Changing the region name - Re-mapping the region
2022-12-16Kernel: Reintroduce the msyscall syscall as the annotate_mapping syscallLiav A
This syscall will be used later on to ensure we can declare virtual memory mappings as immutable (which means that the underlying Region is basically immutable for both future annotations or changing the protection bits of it).
2022-12-14Kernel: Add the auxiliary vector to the stack size validationAgustin Gianni
This patch validates that the size of the auxiliary vector does not exceed `Process::max_auxiliary_size`. The auxiliary vector is a range of memory in userspace stack where the kernel can pass information to the process that will be created via `Process:do_exec`. The reason the kernel needs to validate its size is that the about to be created process needs to have remaining space on the stack. Previously only `argv` and `envp` were taken into account for the size validation, with this patch, the size of `auxv` is also checked. All three elements contain values that a user (or an attacker) can specify. This patch adds the constant `Process::max_auxiliary_size` which is defined to be one eight of the user-space stack size. This is the approach taken by `Process:max_arguments_size` and `Process::max_environment_size` which are used to check the sizes of `argv` and `envp`.
2022-12-11Kernel+LibC+LibELF: Set stack size based on PT_GNU_STACK during execvesin-ack
Some programs explicitly ask for a different initial stack size than what the OS provides. This is implemented in ELF by having a PT_GNU_STACK header which has its p_memsz set to the amount that the program requires. This commit implements this policy by reading the p_memsz of the header and setting the main thread stack size to that. ELF::Image::validate_program_headers ensures that the size attribute is a reasonable value.
2022-12-11Kernel+LibC+Tests: Implement `pwritev(2)`sin-ack
While this isn't really POSIX, it's needed by the Zig port and was simple enough to implement.
2022-12-11Kernel+LibC: Implement `setregid(2)`sin-ack
This copies and adapts the setresgid syscall, following in the footsteps of setreuid and setresuid.
2022-12-11Kernel+LibC+LibCore+UserspaceEmulator: Implement `faccessat(2)`sin-ack
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11Kernel+LibC+LibCore: Implement `renameat(2)`sin-ack
Now with the ability to specify different bases for the old and new paths.
2022-12-11Kernel+LibC+LibCore: Implement `mkdirat(2)`sin-ack
2022-12-11Kernel+LibC: Implement `readlinkat(2)`sin-ack
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-12-11Kernel+LibC+LibCore: Implement `symlinkat(2)`sin-ack
Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2022-11-29Kernel: Add some spec links and comments to sys$posix_fallocate()Andreas Kling
2022-11-29Kernel: Make sys$posix_fallocate() fail with ENODEV on non-regular filesAndreas Kling
Previously we tried to determine if `fd` refers to a non-regular file by doing a stat() operation on the file. This didn't work out very well since many File subclasses don't actually implement stat() but instead fall back to failing with EBADF. This patch fixes the issue by checking for regular files with File::is_regular_file() instead.
2022-11-29Kernel: Remove unnecessary FIXME in sys$posix_fallocate()Andreas Kling
This syscall doesn't need to do anything for ENOSPC, as that is already handled by its callees.
2022-11-26Kernel+LibCore+LibC: Implement support for forcing unveil on execLiav A
To accomplish this, we add another VeilState which is called LockedInherited. The idea is to apply exec unveil data, similar to execpromises of the pledge syscall, on the current exec'ed program during the execve sequence. When applying the forced unveil data, the veil state is set to be locked but the special state of LockedInherited ensures that if the new program tries to unveil paths, the request will silently be ignored, so the program will continue running without receiving an error, but is still can only use the paths that were unveiled before the exec syscall. This in turn, allows us to use the unveil syscall with a special utility to sandbox other userland programs in terms of what is visible to them on the filesystem, and is usable on both programs that use or don't use the unveil syscall in their code.
2022-11-24Kernel: Update tv_nsec field when using utimensat() with UTIME_NOWAndreas Kling
We were only updating the tv_sec field and leaving UTIME_NOW in tv_nsec.
2022-11-13Kernel: Disallow jail creation from a process within a jailLiav A
We now disallow jail creation from a process within a jail because there is simply no valid use case to allow it, and we will probably not enable this behavior (which is considered a bug) again. Although there was no "real" security issue with this bug, as a process would still be denied to join that jail, there's an information reveal about the amount of jails that are or were present in the system.
2022-11-08Kernel: Split the Ext2FileSystem.{cpp,h} files into smaller componentsLiav A
2022-11-08Kernel: Split the ISO9660FileSystem.{cpp,h} files to smaller componentsLiav A
2022-11-08Kernel: Split the DevPtsFS files into smaller componentsLiav A
2022-11-08Kernel: Split the Plan9FileSystem.{cpp,h} file into smaller componentsLiav A
2022-11-08Kernel: Split the ProcFS core file into smaller componentsLiav A
2022-11-08Kernel: Split the FATFileSystem.{cpp,h} files into smaller componentsLiav A
2022-11-08Kernel: Split the TmpFS core files into smaller componentsLiav A
2022-11-08Kernel: Split the SysFS core files into smaller componentsLiav A
2022-11-05Kernel: Add support for jailsLiav A
Our implementation for Jails resembles much of how FreeBSD jails are working - it's essentially only a matter of using a RefPtr in the Process class to a Jail object. Then, when we iterate over all processes in various cases, we could ensure if either the current process is in jail and therefore should be restricted what is visible in terms of PID isolation, and also to be able to expose metadata about Jails in /sys/kernel/jails node (which does not reveal anything to a process which is in jail). A lifetime model for the Jail object is currently plain simple - there's simpy no way to manually delete a Jail object once it was created. Such feature should be carefully designed to allow safe destruction of a Jail without the possibility of releasing a process which is in Jail from the actual jail. Each process which is attached into a Jail cannot leave it until the end of a Process (i.e. when finalizing a Process). All jails are kept being referenced in the JailManagement. When a last attached process is finalized, the Jail is automatically destroyed.
2022-11-05Kernel: Make sys$msyscall() not take the big lockAndreas Kling
This function is already serialized by the address space lock.