summaryrefslogtreecommitdiff
path: root/AK/HashTable.h
AgeCommit message (Collapse)Author
2022-06-23AK: Zero previous pointer *after* fixing the insertion list in HashTableHendiadyoin1
2022-06-22AK: Clear the previous and next pointers of deleted HashTable bucketsIdan Horowitz
Usually the values of the previous and next pointers of deleted buckets are never used, as they're not part of the main ordered bucket chain, but if an in-place rehashing is done, which results in the bucket being turned into a free bucket, the stale pointers will remain, at which point any item that is inserted into said free-bucket will have either a stale previous pointer if the HashTable was empty on insertion, or a stale next pointer, resulting in undefined behaviour. This commit also includes a new HashMap test that reproduces this issue
2022-05-08AK+LibGUI: Pass predicate to *_matching() methods by const referenceVitaly Dyachkov
2022-04-01Everywhere: Run clang-formatIdan Horowitz
2022-03-31AK: Use bucket states with special bit patterns in HashTablekleines Filmröllchen
This simplifies some of the bucket state handling code, as there's now an easy way of checking the basic category of bucket state.
2022-03-31AK: Rehash HashTable in-place instead of shrinkingkleines Filmröllchen
As seen on TV, HashTable can get "thrashed", i.e. it has a bunch of deleted buckets that count towards the load factor. This means that hash tables which are large enough for their contents need to be resized. This was fixed in 9d8da16 with a workaround that shrinks the HashTable back down in these cases, as after the resize and re-hash the load factor is very low again. However, that's not a good solution. If you insert and remove repeatedly around a size boundary, you might get frequent resizes, which involve frequent re-allocations. The new solution is an in-place rehashing algorithm that I came up with. (Do complain to me, I'm at fault.) Basically, it iterates the buckets and re-hashes the used buckets while marking the deleted slots empty. The issue arises with collisions in the re-hash. For this reason, there are two kinds of used buckets during the re-hashing: the normal "used" buckets, which are old and are treated as free space, and the "re-hashed" buckets, which are new and treated as used space, i.e. they trigger probing. Therefore, the procedure for relocating a bucket's contents is as follows: - Locate the "real" bucket of the contents with the hash. That bucket is the starting point for the target bucket, and the current (old) bucket is the bucket we want to move. - While we still need to move the bucket: - If we're the target, something strange happened last iteration or we just re-hashed to the same location. We're done. - If the target is empty or deleted, just move the bucket. We're done. - If the target is a re-hashed full bucket, we probe by double-hashing our hash as usual. Henceforth, we move our target for the next iteration. - If the target is an old full bucket, we swap the target and to-move buckets. Therefore, the bucket to move is a the correct location and the former target, which still needs to find a new place, is now in the bucket to move. So we can just continue with the loop; the target is re-obtained from the bucket to move. This happens for each and every bucket, though some buckets are "coincidentally" moved before their point of iteration is reached. Either way, this guarantees full in-place movement (even without stack storage) and therefore space complexity of O(1). Time complexity is amortized O(2n) asssuming a good hashing function. This leads to a performance improvement of ~30% on the benchmark introduced with the last commit. Co-authored-by: Hendiadyoin1 <leon.a@serenityos.org>
2022-03-31AK: Merge HashTable bucket state into one enumkleines Filmröllchen
The hash table buckets had three different state booleans that are in fact exclusive. In preparation for further states, this commit consolidates them into one enum. This has the added benefit on not relying on the compiler's boolean packing anymore; we definitely now only need one byte for the bucket state.
2022-03-15AK+Kernel: Avoid double memory clearing of HashTable bucketsDaniel Bertalan
Since the allocated memory is going to be zeroed immediately anyway, let's avoid redundantly scrubbing it with MALLOC_SCRUB_BYTE just before that. The latest versions of gcc and Clang can automatically do this malloc + memset -> calloc optimization, but I've seen a couple of places where it failed to be done. This commit also adds a naive kcalloc function to the kernel that doesn't (yet) eliminate the redundancy like the userland does.
2022-03-07AK: Automatically shrink HashTable when removing entriesAndreas Kling
If the utilization of a HashTable (size vs capacity) goes below 20%, we'll now shrink the table down to capacity = (size * 2). This fixes an issue where tables would grow infinitely when inserting and removing keys repeatedly. Basically, we would accumulate deleted buckets with nothing reclaiming them, and eventually deciding that we needed to grow the table (because we grow if used+deleted > limit!) I found this because HashTable iteration was taking a suspicious amount of time in Core::EventLoop::get_next_timer_expiration(). Turns out the timer table kept growing in capacity over time. That made iteration slower and slower since HashTable iterators visit every bucket.
2022-03-07AK: Remove return value from HashTable::remove() and HashMap::remove()Andreas Kling
This was only used by remove_all_matching(), where it's no longer used.
2022-03-07AK: Simplify HashTable::remove_all_matching()Andreas Kling
Just walk the table from start to finish, deleting buckets as we go. This removes the need for remove() to return an iterator, which is preventing me from implementing hash table auto-shrinking.
2022-01-29AK: Support using custom comparison operations for hash compatible keysIdan Horowitz
2022-01-25AK: Implement `HashTable::try_ensure_capacity`, as used in `HashMap`James Puleo
This was used in `HashMap::try_ensure_capacity`, but was missing from `HashTable`s implementation. No one had used `HashMap::try_ensure_capacity` before so it went unnoticed!
2022-01-05AK: Make Hash{Map,Table}::remove_all_matching() return removal successAndreas Kling
These functions now return whether one or more entries were removed.
2022-01-05AK: Add HashTable::remove_all_matching(predicate)Andreas Kling
This removes all matching entries from a table in a single pass.
2021-12-15AK: Enable fast path for removal by hash-compatible key in HashMap/TableHendiadyoin1
2021-12-15AK: Allow hash-compatible key types in Hash[Table|Map] lookupHendiadyoin1
This will allow us to avoid some potentially expensive type conversion during lookup, like form String to StringView, which would allocate memory otherwise.
2021-11-14AK: Resolve clang-tidy readability-qualified-auto warningsAndrew Kaster
... In files included by Kernel/Process.cpp and Kernel/Thread.cpp
2021-11-14AK: Resolve clang-tidy readability-bool-conversion warningsAndrew Kaster
... In files included by Kernel/Process.cpp and Kernel/Thread.cpp
2021-11-11AK: Allow to clear HashTables/Maps with capacityHendiadyoin1
2021-11-11AK: Make HashTable and HashMap try_* functions return ErrorOr<T>Andreas Kling
This allows us to use TRY() and MUST() with them.
2021-10-06AK: Add missing headersBen Wiederhake
Example failure: IDAllocator.h only pulls in AK/Hashtable.h, so any compilation unit that includes AK/IDAllocator.h without including AK/Traits.h before it used to be doomed to fail with the cryptic error message "In instantiation of 'AK::HashTable<T, TraitsForT, IsOrdered>::Iterator AK::HashTable<T, TraitsForT, IsOrdered>::find(const T&) [with T = int; TraitsForT = AK::Traits: incomplete type 'AK::Traits<int>' used in nested name specifier".
2021-09-10AK: Mark HashTable::size_in_bytes() as constexprHendiadyoin1
2021-09-10AK: Add OOM safe interface to HashTable/MapHediadyoin1
This adds a new HashSetResult only returned by try_set, to signal allocation failure during setting.
2021-09-07Everywhere: Behaviour => BehaviorAndreas Kling
2021-07-21AK: Remove unused private HashTable::lookup_for_reading()Andreas Kling
2021-07-21AK: Sprinkle [[nodiscard]] on HashMap and HashTableAndreas Kling
2021-07-13HashTable: Rename finders with a more accurate and self-descripting namengc6302h
2021-07-11AK: Use kfree_sized() in AK::HashTableAndreas Kling
2021-06-15AK: Add Ordering support to HashTable and HashMapHediadyoin1
Adds a IsOrdered flag to Hashtable and HashMap, which allows iteration in insertion order
2021-06-09AK: Allow changing the HashTable behaviour for sets on existing entriesIdan Horowitz
Specifically, replacing the existing entry or just keeping it and canceling the set.
2021-05-30AK: Make HashTable::operator=(HashTable&&) clear the moved-from tableAndreas Kling
This is consistent with how other AK containers behave when moved from.
2021-05-15AK+LibC: Implement malloc_good_size() and use it for Vector/HashTableGunnar Beutner
This implements the macOS API malloc_good_size() which returns the true allocation size for a given requested allocation size. This allows us to make use of all the available memory in a malloc chunk. For example, for a malloc request of 35 bytes our malloc would internally use a chunk of size 64, however the remaining 29 bytes would be unused. Knowing the true allocation size allows us to request more usable memory that would otherwise be wasted and make that available for Vector, HashTable and potentially other callers in the future.
2021-04-22Everything: Move to SPDX license identifiers in all files.Brian Gianforcaro
SPDX License Identifiers are a more compact / standardized way of representing file license information. See: https://spdx.dev/resources/use/#identifiers This was done with the `ambr` search and replace tool. ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-11AK: Annotate HashTable functions as [[nodiscard]]Brian Gianforcaro
2021-04-11AK: Make HashTable with capacity constructor explicitBrian Gianforcaro
2021-04-02AK: Inline HashTable writing bucket lookupthislooksfun
The old approach was more complex and also had a very bad edge case with lots of collisions. This approach eliminates that possiblility. It also makes both reading and writing lookups a little bit faster.
2021-04-02AK: Inline the bucket index calculationthislooksfun
The result of the modulo is only used in the array index, so why make the code more complex by calculating it in two different places?
2021-03-12Everywhere: Remove klog(), dbg() and purge all LogStream usage :^)Andreas Kling
Good-bye LogStream. Long live AK::Format!
2021-02-23Everywhere: Rename ASSERT => VERIFYAndreas Kling
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED) Since all of these checks are done in release builds as well, let's rename them to VERIFY to prevent confusion, as everyone is used to assertions being compiled out in release. We can introduce a new ASSERT macro that is specifically for debug checks, but I'm doing this wholesale conversion first since we've accumulated thousands of these already, and it's not immediately obvious which ones are suitable for ASSERT.
2021-01-31HashTable: Correctly pass args to setLenny Maiorani
Problem: - Using regular functions rather than function templates results in the arguments not being deduced. This then requires the same function to be written multiple times and for `move` to be used rather than `forward`. Solution: - Collapse multiple function overloads to a single function template with a deduced argument. This allows the argument to be a forwarding reference and bind to either an l-value or r-value and forward the value.
2021-01-12AK: Simplify constructors and conversions from nullptr_tLenny Maiorani
Problem: - Many constructors are defined as `{}` rather than using the ` = default` compiler-provided constructor. - Some types provide an implicit conversion operator from `nullptr_t` instead of requiring the caller to default construct. This violates the C++ Core Guidelines suggestion to declare single-argument constructors explicit (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c46-by-default-declare-single-argument-constructors-explicit). Solution: - Change default constructors to use the compiler-provided default constructor. - Remove implicit conversion operators from `nullptr_t` and change usage to enforce type consistency without conversion.
2020-10-18AK: Reduce memory writes in HashTable destructorDano Perniš
2020-10-18AK: Implement HashTable assignment in terms of swapDano Perniš
2020-10-18AK: Provide swap() for HashTableDano Perniš
2020-10-16AK: Tune HashTable load factorAndreas Kling
Double the capacity when used+deleted buckets crosses 60% of capacity. This appears to be a sweet spot for performance based on some ad-hoc testing with test-js. :^)
2020-10-15AK: Redesign HashTable to use closed hashingAndreas Kling
Instead of each hash bucket being a SinglyLinkedList, switch to using closed hashing (open addressing). Buckets are chained together via double hashing (hashing the hash until we find an unused bucket.) This greatly reduces malloc traffic, since each added element no longer allocates a new linked list node. Appears performance neutral on test-js. Can definitely be tuned and could use proper management of load factor, etc.
2020-08-16AK: HashTable add a constructor that allows preallocation of capacity + Use ↵Muhammad Zahalqa
in CppLexer. (#3147) 1. Add general utility to get array number of elements. 2. Add Needed API to AK::HashTable 3. Refactor CppLexer initialization
2020-07-09AK: HashTable/HashMap return whether action was performed for set/removeTom
This allows performing an action based on whether something was actually added or removed without having to look it up prior to calling set() or remove().
2020-02-27AK: Expose SinglyLinkedListIterator constructorWilliam McPherson
This commit replaces SinglyLinkedListIterator::universal_end() with an empty SinglyLinkedListIterator(). Piano needs this in order to initialize a member array of iterators without 84 lines of universal_end().