Age | Commit message (Collapse) | Author |
|
|
|
We're now able to detect all the AMD-defined CPUID feature flags from
ECX/EDX for EAX=80000001h :^)
|
|
We're now able to detect all the extended CPUID feature flags from
EBX/ECX/EDX for EAX=7 :^)
|
|
We're now able to detect all the regular CPUID feature flags from
ECX/EDX for EAX=1 :^)
None of the new ones are being used for anything yet, but they will show
up in /proc/cpuinfo and subsequently lscpu and SystemMonitor.
Note that I replaced the periods from the SSE 4.1 and 4.2 instructions
with underscores, which matches the internal enum names, Linux's
/proc/cpuinfo and the general pattern of replacing special characters
with underscores to limit feature names to [a-z0-9_].
The enum member stringification has been moved to a new function for
better re-usability and to avoid cluttering up Processor.cpp.
|
|
This will make it possible to add many, many more CPU features - more
than the current limit 32 and later limit of 64 if we stick with an enum
class to be specific :^)
|
|
Checks of ECX go before EDX, and the bit indices are now ordered
properly. Additionally, handling of the EDX[11] bit has been moved into
a lambda function to keep the series of if statements neatly together.
All of this makes it *a lot* easier to follow along and compare the
implementation to the tables in the Intel manual, e.g. to find missing
checks.
|
|
|
|
This works around issue #10382 until it is fixed on QEMU's side.
Patch from Anonymous.
|
|
Function-local `static constexpr` variables can be `constexpr`. This
can reduce memory consumption, binary size, and offer additional
compiler optimizations.
These changes result in a stripped x86_64 kernel binary size reduction
of 592 bytes.
|
|
Move this architecture-specific sanity check (IOPL must be 0) out of
Scheduler and into the x86 enter_thread_context(). Also do this for
every thread and not just userspace ones.
|
|
It was annoyingly hard to spot these when we were using them with
different amounts of qualification everywhere.
This patch uses Thread::State::Foo everywhere instead of Thread::Foo
or just Foo.
|
|
Signal dispatch is already taken care of elsewhere, so there appears to
be no need for the hack in enter_current().
This also allows us to remove the Thread::m_in_block flag, simplifying
thread blocking logic somewhat.
Verified with the original repro for #4336 which this was meant to fix.
|
|
...and deal with the fallout by adding missing includes everywhere.
|
|
This allows us to enable Write-Combine on e.g. framebuffers,
significantly improving performance on bare metal.
To keep things simple we right now only use one of up to three bits
(bit 7 in the PTE), which maps to the PA4 entry in the PAT MSR, which
we set to the Write-Combine mode on each CPU at boot time.
|
|
|
|
Since the inline capacity of the Vector return type was not specified
explicitly, the vector was automatically copied to a 0-length inline
capacity one, essentially eliminating the optimization.
|
|
|
|
Contradictory to the comment above it, this while loop was actually
clearing the selectors above or equal to the edited one (instead of
the selectors that were skipped when the gdt was extended), this wasn't
really an issue so far, as all calls to this function did extend the
GDT, which meant this condition was always false, but future calls to
this function that will try to edit an existing entry would fail.
|
|
Some of the enum members were also renamed to reflect the fact that the
segment sizes are not necessarily 32bit (64bit on x86_64).
|
|
|
|
Also improve the comments around that initialisation code.
|
|
|
|
Add a kernel data segment and make the user code segment come after
the data segment. We need the GDT to be in a certain order to support
the syscall and sysret instruction pair.
|
|
In order to reduce our reliance on __builtin_{ffs, clz, ctz, popcount},
this commit removes all calls to these functions and replaces them with
the equivalent functions in AK/BuiltinWrappers.h.
|
|
Unsurprisingly, the /proc/PID/stacks/TID stack walk had the same
arbitrary memory read problem as the perf event stack walk.
It would be nice if the kernel had a single stack walk implementation,
but that's outside the scope of this commit.
|
|
A new header file has been created in the Arch/ folder while the
implementation has been moved into a CPP living in the X86 folder.
|
|
The platform independent Processor.h file includes the shared processor
code and includes the specific platform header file.
All references to the Arch/x86/Processor.h file have been replaced with
a reference to Arch/Processor.h.
|
|
|
|
|
|
|
|
|
|
This fixes a triple fault that occurs when compiling serenity with
the i686 clang toolchain. (The underlying issue is that the old inline
assembly did not specify that it clobbered the eax/ecx/edx registers
and as such the compiler assumed they were not changed and used their
values across it)
Co-authored-by: Brian Gianforcaro <bgianf@serenityos.org>
|
|
|
|
This makes EFAULT propagation flow much more naturally. :^)
|
|
This allows addressing all cores on more modern processors. For now,
we still have a hardcoded limit of 64 due to s_processors being a
static array.
|
|
Initializing the variable this way fixes a kernel panic in Clang where
the object was zero-initialized, so the `m_in_scheduler` contained the
wrong value. GCC got it right, but we're better off making this change,
as leaving uninitialized fields in constant-initialized objects can
cause other weird situations like this. Also, initializing only a single
field to a non-zero value isn't worth the cost of no longer fitting in
`.bss`.
Another two variables suffer from the same problem, even though their
values are supposed to be zero. Removing these causes the
`_GLOBAL_sub_I_` function to no longer be generated and the (not
handled) `.init_array` section to be omitted.
|
|
We can use ThreadRegisters::set_flags() to avoid the #ifdef's here.
|
|
|
|
And let id() be the non-static version that gives you the ID of a
Processor object.
|
|
This matches MutexLocker, and doesn't sound like it's a lock itself.
|
|
|
|
This has several benefits:
1) We no longer just blindly derefence a null pointer in various places
2) We will get nicer runtime error messages if the current process does
turn out to be null in the call location
3) GCC no longer complains about possible nullptr dereferences when
compiling without KUBSAN
|
|
Leave interrupts enabled so that we can still process IRQs. Critical
sections should only prevent preemption by another thread.
Co-authored-by: Tom <tomut@yahoo.com>
|
|
By making these functions static we close a window where we could get
preempted after calling Processor::current() and move to another
processor.
Co-authored-by: Tom <tomut@yahoo.com>
|
|
Processing SMP messages outside of non-SMP mode is a waste of time,
and now that we don't rely on the side effects of calling the message
processing function, let's stop calling it entirely. :^)
|
|
We were previously relying on a side effect of the critical section in
smp_process_pending_messages(): when exiting that section, it would
process any pending deferred calls.
Instead of relying on that, make the deferred invocations explicit by
calling deferred_call_execute_pending() in exit_trap().
This ensures that deferred calls get processed before entering the
scheduler at the end of exit_trap(). Since thread unblocking happens
via deferred calls, the threads don't have to wait until the next
scheduling opportunity when they could be ready *now*. :^)
This was the main reason Tom's SMP branch ran slowly in non-SMP mode.
|
|
|
|
Enter a critical section in Processor::exit_trap so that processing
SMP messages doesn't enable interrupts upon leaving. We need to delay
this until the end where we call into the Scheduler if exiting the
trap results in being outside of a critical section and irq handler.
Co-authored-by: Tom <tomut@yahoo.com>
|
|
We can't enter/leave SMP mode once the kernel is up and running.
|
|
- Use the receiver's per-CPU entry in the message, instead of the
sender's. (Using the sender's entry wasn't safe for broadcast
messages since the same entry ended up on multiple message queues.)
- Retry the CAS until it *succeeds* instead of *fails*. This closes a
race window, and also ensures a correct return value. The return value
is used by the caller to decide whether to broadcast an IPI.
This was the main reason smp=on was so slow. We had CPUs busy-waiting
until someone else triggered an IPI and moved things along.
- Add a CPU pause hint to the spin loop. :^)
|