Age | Commit message (Collapse) | Author |
|
This is a complete fix of clock_nanosleep, because the thread holds the
process lock again when returning from sleep()/sleep_until().
Therefore, no further concurrent invalidation can occur.
|
|
Also, duplicate data in dbg() and klog() calls were removed.
In addition, leakage of virtual address to kernel log is prevented.
This is done by replacing kprintf() calls to dbg() calls with the
leaked data instead.
Also, other kprintf() calls were replaced with klog().
|
|
This was only used by the mechanism for mapping executables into each
process's own address space. Now that we remap executables on demand
when needed for symbolication, this can go away.
|
|
This is both simpler and more robust than mapping them in the process
address space.
|
|
Previously we would map the entire executable of a program in its own
address space (but make it unavailable to userspace code.)
This patch removes that and changes the symbolication code to remap
the executable on demand (and into the kernel's own address space
instead of the process address space.)
This opens up a couple of further simplifications that will follow.
|
|
This makes Region::clone() do the right thing with it on fork().
|
|
This makes Region::clone() do the right thing for these now that we
differentiate based on Region::is_shared().
|
|
I had the wrong idea about this. Thanks to Sergey for pointing it out!
Here's what he says (reproduced for posterity):
> Private mappings protect the underlying file from the changes made by
> you, not the other way around. To quote POSIX, "If MAP_PRIVATE is
> specified, modifications to the mapped data by the calling process
> shall be visible only to the calling process and shall not change the
> underlying object. It is unspecified whether modifications to the
> underlying object done after the MAP_PRIVATE mapping is established
> are visible through the MAP_PRIVATE mapping." In practice that means
> that the pages that were already paged in don't get updated when the
> underlying file changes, and the pages that weren't paged in yet will
> load the latest data at that moment.
> The only thing MAP_FILE | MAP_PRIVATE is really useful for is mapping
> a library and performing relocations; it's definitely useless (and
> actively harmful for the system memory usage) if you only read from
> the file.
This effectively reverts e2697c2dddd531c0ac7cad3fd6ca78e81d0d86da.
|
|
|
|
This way we can trace many things and we get one perfcore file per
process instead of everyone trying to write to "perfcore"
|
|
|
|
|
|
This will be a memory usage pessimization until we actually implement
CoW sharing of the memory pages with SharedInodeVMObject.
However, it's a huge architectural improvement, so let's take it and
improve on this incrementally.
fork() should still be neutral, since all private mappings are CoW'ed.
|
|
It's now up to the caller to provide a VMObject when constructing a new
Region object. This will make it easier to handle things going wrong,
like allocation failures, etc.
|
|
If we wrote anything we should just inform userspace that we did,
and not worry about the error code. Userspace can call us again if
it wants, and we'll give them the error then.
|
|
We don't have to log the process name/PID/TID, dbg() automatically adds
that as a prefix to every line.
Also we don't have to do .characters() on Strings passed to dbg() :^)
|
|
You can now mmap a file as private and writable, and the changes you
make will only be visible to you.
This works because internally a MAP_PRIVATE region is backed by a
unique PrivateInodeVMObject instead of using the globally shared
SharedInodeVMObject like we always did before. :^)
Fixes #1045.
|
|
InodeFile now directly calls Process::allocate_region_with_vmobject()
instead of taking an awkward detour via a special Region constructor.
|
|
We now have PrivateInodeVMObject and SharedInodeVMObject, corresponding
to MAP_PRIVATE and MAP_SHARED respectively.
Note that PrivateInodeVMObject is not used yet.
|
|
|
|
Let's not keep raw Region* variables around like that when it's so easy
to avoid it.
|
|
|
|
Add an extra out-parameter to shbuf_get() that receives the size of the
shared buffer. That way we don't need to make a separate syscall to
get the size, which we always did immediately after.
|
|
This feels a lot more consistent and Unixy:
create_shared_buffer() => shbuf_create()
share_buffer_with() => shbuf_allow_pid()
share_buffer_globally() => shbuf_allow_all()
get_shared_buffer() => shbuf_get()
release_shared_buffer() => shbuf_release()
seal_shared_buffer() => shbuf_seal()
get_shared_buffer_size() => shbuf_get_size()
Also, "shared_buffer_id" is shortened to "shbuf_id" all around.
|
|
Also, fix a bad derefernce in sys$create_shared_buffer() method.
|
|
Will caught an assertion when running "kill 9999999999999" :^)
|
|
If a process doesn't have any threads left, it's in a zombie state and
we can't meaningfully send signals to it. So just ignore them.
Fixes #1313.
|
|
Work towards #1313.
|
|
set_interrupted_by_death was never called whenever a thread that had
a joiner died, so the joiner remained with the joinee pointer there,
resulting in an assertion fail in JoinBlocker: m_joinee pointed to
a freed task, filled with garbage.
Thread::current->m_joinee may not be valid after the unblock
Properly return the joinee exit value to the joiner thread.
|
|
On 32-bit platforms, INT32_MIN == -INT32_MIN, so we can't expect this
to always work:
if (pid < 0)
positive_pid = -pid; // may still be negative!
This happens because the -INT32_MIN expression becomes a long and is
then truncated back to an int.
Fixes #1312.
|
|
This allows a process wich has more than 1 thread to call exec, even
from a thread. This kills all the other threads, but it won't wait for
them to finish, just makes sure that they are not in a running/runable
state.
In the case where a thread does exec, the new program PID will be the
thread TID, to keep the PID == TID in the new process.
This introduces a new function inside the Process class,
kill_threads_except_self which is called on exit() too (exit with
multiple threads wasn't properly working either).
Inside the Lock class, there is the need for a new function,
clear_waiters, which removes all the waiters from the
Process::big_lock. This is needed since after a exit/exec, there should
be no other threads waiting for this lock, the threads should be simply
killed. Only queued threads should wait for this lock at this point,
since blocked threads are handled in set_should_die.
|
|
|
|
Process::m_regions is not sorted, so we can use unstable_remove()
to avoid shifting the vector contents. :^)
|
|
This turns use-after-free bugs into null pointer dereferences instead.
|
|
Each process has a 1-level lookup cache for fast repeated lookups of
the same VM region (which tends to be the majority of lookups.)
The cache is used by the following syscalls: munmap, madvise, mprotect
and set_mmap_name.
After a succesful exec(), there could be a stale Region* in the lookup
cache, and the new executable was able to manipulate it using a number
of use-after-free code paths.
|
|
|
|
When committing to a new executable, disown any shared buffers that the
process was previously co-owning.
Otherwise accessing the same shared buffer ID from the new program
would cause the kernel to find a cached (and stale!) reference to the
previous program's VM region corresponding to that shared buffer,
leading to a Region* use-after-free.
Fixes #1270.
|
|
Since we're gonna throw away these stacks at the end of exec anyway,
we might as well disable profiling before starting to mess with the
process page tables. One less weird situation to worry about in the
sampling code.
|
|
We now log the new executable on exec() and throw away all the samples
we've accumulated so far. But profiling keeps going.
|
|
|
|
|
|
|
|
There's not really enough of these to justify using a HashTable.
|
|
Suggested by Sergey. The currently running Thread and Process are now
Thread::current and Process::current respectively. :^)
|
|
Process teardown is divided into two main stages: finalize and reap.
Finalization happens in the "Finalizer" kernel and runs with interrupts
enabled, allowing destructors to take locks, etc.
Reaping happens either in sys$waitid() or in the scheduler for orphans.
The more work we can do in finalization, the better, since it's fully
pre-emptible and reduces the amount of time the system runs without
interrupts enabled.
|
|
This is exposed via the non-standard serenity_mmap() call in userspace.
|
|
|
|
|
|
|
|
|