Description
Today I started exploring the stdlib (which I think is a great effort!) and I did a performance test of the hash map implementation. The test I did is rather simple: generate keys (with duplicates), and then add a key if it is not yet present, and remove a key if it is already present. The code is copied below. I compared performance against another Fortran hash map implementation I once made (https://github.com/jannisteunissen/ffhash), which was based on khash from https://github.com/attractivechaos/klib
With standard compile flags (-O2) and a test size of 5 million, on an Intel i7-1185G7 laptop CPU, I got the following numbers:
open_hashmap_type from stdlib, using fnv_1_hasher:
Elapsed time (s) 0.2749E+01
Entries/s 0.1819E+07
while the other code (ffhash, murmur3 hasher)
Elapsed time (s) 0.2541E+00
Entries/s 0.1968E+08
A factor 10 seems a bit much. I would expect some overhead due to the class-type interface, but not this much. Perhaps this benchmark can be useful in improving the performance.
program open_hashmap_benchmark
use stdlib_kinds, only: dp, int8, int32
use stdlib_hashmaps, only : open_hashmap_type
use stdlib_hashmap_wrappers
implicit none
type(open_hashmap_type) :: map
integer, parameter :: n_max = 5*1000*1000
integer :: n
integer(int32), allocatable :: keys(:)
integer, allocatable :: key_counts(:)
real(dp) :: t_start, t_end
real(dp), allocatable :: r_uniform(:)
type(key_type) :: key
logical :: present
integer(int8) :: int32_as_int8(4)
call map%init(fnv_1_hasher, slots_bits=10)
allocate(keys(n_max), r_uniform(n_max))
call random_number(r_uniform)
keys = nint(r_uniform * n_max * 0.25_dp)
call cpu_time(t_start)
do n = 1, n_max
int32_as_int8 = transfer(keys(n), int32_as_int8)
call set(key, int32_as_int8)
call map%key_test(key, present)
if (present) then
call map%remove(key)
else
call map%map_entry(key)
end if
end do
call cpu_time(t_end)
write(*, "(A,E12.4)") "Elapsed time (s) ", t_end - t_start
write(*, "(A,E12.4)") "Entries/s ", n_max/(t_end - t_start)
write(*, "(A,I12)") "n_occupied ", map%entries()
write(*, "(A,I12)") "n_buckets ", map%num_slots()
! Count number of keys that occur an odd number of times
allocate(key_counts(minval(keys):maxval(keys)))
key_counts = 0
do n = 1, n_max
key_counts(keys(n)) = key_counts(keys(n)) + 1
end do
n = sum(iand(key_counts, 1))
if (n /= map%entries()) then
error stop "FAILED"
else
print *, "PASSED"
end if
! Clean up allocated storage
deallocate(key_counts)
end program open_hashmap_benchmarka
The benchmark I compared against is https://github.com/jannisteunissen/ffhash/blob/master/example_benchmark.f90
Activity
jvdp1 commentedon Apr 2, 2024
Thank you @jannisteunissen for this report. I tested your code on my machine, and I got similar results with
gfortran
and thefpm
options used with the profilerelease
. I tested other hashers and got similar times.gareth-nx commentedon Apr 3, 2024
@wclodius2
gareth-nx commentedon Apr 4, 2024
I have limited knowledge of hash maps, but am puzzled by this small modification of @jannisteunissen's program.
If replace the
open_hashmap_type
with achaining_hashmap_type
then the speed is significantly faster, but the program fails at theerror stop
.From inspection, it seems that for
chaining_hashmap_type
a call tomap%remove()
does not always lead tomap%entries()
being reduced by 1. Whereas that does happen for theopen_hashmap_type
.map%remove
.map%remove
is here for the open hashmap and here for the chaining hashmapDoes anyone know if this is expected? @wclodius2
jvdp1 commentedon Apr 4, 2024
@gareth-nx I spotted the same issue, and plan to submit a PR soon to fix it.
gareth-nx commentedon Apr 4, 2024
@jvdp1 I was just looking at this, and found that if we append the line
right at the end of
remove_chaining_entry
, then the test passes, while the speed remains much better than foropen_hashmap_type
.I guess we need to add some additional tests of the hashmaps!
jannisteunissen commentedon Apr 4, 2024
Glad to see that this test was also useful in other ways! I'd be happy to contribute it to to the test suite. Let me also mention that the code I compared with used open addressing (and quadratic probing), so in principle better performance should be possible for
open_hashmap_type
.wclodius2 commentedon Apr 10, 2024
gareth-nx commentedon Apr 10, 2024
No problem at all @wclodius2 -- enjoy your vacation -- and please don't feel obliged to respond unless you feel like it.