Age | Commit message (Collapse) | Author |
|
GCC with -flto is more aggressive when it comes to inlining and
discarding functions which is why we must mark some of the functions
as NEVER_INLINE (because they contain asm labels which would be
duplicated in the object files if the compiler decides to inline
the function elsewhere) and __attribute__((used)) for others so
that GCC doesn't discard them.
|
|
SPDX License Identifiers are a more compact / standardized
way of representing file license information.
See: https://spdx.dev/resources/use/#identifiers
This was done with the `ambr` search and replace tool.
ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
|
|
We should never enter the syscall handler from a kernel address.
|
|
This also brings LibC's abort() function closer to the spec.
|
|
|
|
|
|
Alot of code is shared between i386/i686/x86 and x86_64
and a lot probably will be used for compatability modes.
So we start by moving the headers into one Directory.
We will probalby be able to move some cpp files aswell.
|
|
|
|
|
|
It's now possible to build the whole kernel with an x86_64 toolchain.
There's no bootstrap code so it doesn't work yet (obviously.)
|
|
This was the original approach before we switched to get_fast_random()
which wasn't fast enough, so we added a buffer.
Unfortunately that buffer is racy and we can actually skid past the end
of it and continue fetching "random" offsets from the adjacent memory
for a while, until we run out of kernel data segment and trip a fault.
Instead of making this even more convoluted, let's just go back to the
pleasantly simple (RDTSC & 0xff) approach. :^)
Fixes #4912.
|
|
|
|
|
|
This makes it a lot easier to return errors since we no longer have to
worry about negating EFOO errors and can just return them flat.
|
|
This was another vestige from a long time ago, when exiting a thread
would mutate global data structures that were only protected by the
interrupt flag.
|
|
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)
Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.
We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
|
|
|
|
We were doing stack and syscall-origin region validations before
taking the big process lock. There was a window of time where those
regions could then be unmapped/remapped by another thread before we
proceed with our syscall.
This patch closes that window, and makes sys$get_stack_bounds() rely
on the fact that we now know the userspace stack pointer to be valid.
Thanks to @BenWiederhake for spotting this! :^)
|
|
|
|
If this happens then the kernel is in an undefined state, so we should
rather panic than attempt to limp along.
|
|
This patch adds Space, a class representing a process's address space.
- Each Process has a Space.
- The Space owns the PageDirectory and all Regions in the Process.
This allows us to reorganize sys$execve() so that it constructs and
populates a new Space fully before committing to it.
Previously, we would construct the new address space while still
running in the old one, and encountering an error meant we had to do
tedious and error-prone rollback.
Those problems are now gone, replaced by what's hopefully a set of much
smaller problems and missing cleanups. :^)
|
|
This patch adds sys$msyscall() which is loosely based on an OpenBSD
mechanism for preventing syscalls from non-blessed memory regions.
It works similarly to pledge and unveil, you can call it as many
times as you like, and when you're finished, you call it with a null
pointer and it will stop accepting new regions from then on.
If a syscall later happens and doesn't originate from one of the
previously blessed regions, the kernel will simply crash the process.
|
|
This allows us to determine what the previous mode (user or kernel)
was, e.g. in the timer interrupt. This is used e.g. to determine
whether a signal handler should be set up.
Fixes #5096
|
|
It was possible to overwrite the entire EFLAGS register since we didn't
do any masking in the ptrace and sigreturn syscalls.
This made it trivial to gain IO privileges by raising IOPL to 3 and
then you could talk to hardware to do all kinds of nasty things.
Thanks to @allesctf for finding these issues! :^)
Their exploit/write-up: https://github.com/allesctf/writeups/blob/master/2020/hxpctf/wisdom2/writeup.md
|
|
This prevents zombies created by multi-threaded applications and brings
our model back to closer to what other OSs do.
This also means that SIGSTOP needs to halt all threads, and SIGCONT needs
to resume those threads.
|
|
Fix some problems with join blocks where the joining thread block
condition was added twice, which lead to a crash when trying to
unblock that condition a second time.
Deferred block condition evaluation by File objects were also not
properly keeping the File object alive, which lead to some random
crashes and corruption problems.
Other problems were caused by the fact that the Queued state didn't
handle signals/interruptions consistently. To solve these issues we
remove this state entirely, along with Thread::wait_on and change
the WaitQueue into a BlockCondition instead.
Also, deliver signals even if there isn't going to be a context switch
to another thread.
Fixes #4336 and #4330
|
|
This makes the Scheduler a lot leaner by not having to evaluate
block conditions every time it is invoked. Instead evaluate them as
the states change, and unblock threads at that point.
This also implements some more waitid/waitpid/wait features and
behavior. For example, WUNTRACED and WNOWAIT are now supported. And
wait will now not return EINTR when SIGCHLD is delivered at the
same time.
|
|
|
|
|
|
Cuts time needed for `disasm /bin/id` from 2.5s to 1s -- identical
to the time it needs when not doing the random adjustment at all.
The downside is that it's now very easy to get the random offsets
with out-of-bounds reads, so it does make this mitigation less
effective.
|
|
Userspace<void*> is a bit strange here, as it would appear to the
user that we intend to de-refrence the pointer in kernel mode.
However I think it does a good join of illustrating that we are
treating the void* as a value type, instead of a pointer type.
|
|
|
|
Allow passing in an optional timeout to Thread::block and move
the timeout check out of Thread::Blocker. This way all Blockers
implicitly support timeouts and don't need to implement it
themselves. Do however allow them to override timeouts (e.g.
for sockets).
|
|
Let's emphasize that these functions actually go out and find regions.
|
|
We had a fast-path for the gettid syscall that was useful before
we started caching the thread ID in LibC. Just get rid of it. :^)
|
|
|
|
This allows us to query the current thread and process on a
per processor basis
|
|
Moving certain globals into a new Processor structure for
each CPU allows us to eventually run an instance of the
scheduler on each CPU.
|
|
This commit adds a basic implementation of
the ptrace syscall, which allows one process
(the tracer) to control another process (the tracee).
While a process is being traced, it is stopped whenever a signal is
received (other than SIGCONT).
The tracer can start tracing another thread with PT_ATTACH,
which causes the tracee to stop.
From there, the tracer can use PT_CONTINUE
to continue the execution of the tracee,
or use other request codes (which haven't been implemented yet)
to modify the state of the tracee.
Additional request codes are PT_SYSCALL, which causes the tracee to
continue exection but stop at the next entry or exit from a syscall,
and PT_GETREGS which fethces the last saved register set of the tracee
(can be used to inspect syscall arguments and return value).
A special request code is PT_TRACE_ME, which is issued by the tracee
and causes it to stop when it calls execve and wait for the
tracer to attach.
|
|
Installing an interrupt handler on the syscall IDT vector can lead to
fatal results, so we must assert if that happens.
|
|
Let's rip off the band-aid
|
|
|
|
Also, duplicate data in dbg() and klog() calls were removed.
In addition, leakage of virtual address to kernel log is prevented.
This is done by replacing kprintf() calls to dbg() calls with the
leaked data instead.
Also, other kprintf() calls were replaced with klog().
|
|
|
|
syscall_handler was not actually updating the value in regs->eax, so the
gettid() was always returning 85: the value of regs->eax was not
actually updated, and it remained the one from Userland (the value of
SC_gettid).
The syscall_handler was modified to actually get a pointer to
RegisterState, so any changes to it will actually be saved.
NOTE: This was actually more of a compiler optimization:
On the SC_gettid flow, we saved in regs.eax the return value of
sys$gettid(), but the compiler discarded it, since it followed a return.
On a normal flow, the value of regs.eax was reused in
tracer->did_syscall, so the compiler actually updated the value.
|
|
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
|
|
|
|
|
|
Since these are not part of the system call convention, we don't care
what userspace had in there. Might as well scrub it before entering
the kernel.
I would scrub EBP too, but that breaks the comfy kernel-thru-userspace
stack traces we currently get. It can be done with some effort.
|
|
The userspace locks are very aggressively calling sys$gettid() to find
out which thread ID they have.
Since syscalls are quite heavy, this can get very expensive for some
programs. This patch adds a fast-path for sys$gettid(), which makes it
skip all of the usual syscall validation and just return the thread ID
right away.
This cuts Kernel/Process.cpp compile time by ~18%, from ~29 to ~24 sec.
|