Skip to content

Slow deserialization compared to stock HashMap #14

Closed
@hniksic

Description

@hniksic

I tried using FxHashMap to improve performance of internal hash maps, but I noticed that it takes several times longer to deserialize a large hash map that we need to persist. I couldn't find anything about it on google, so I created a minimal example and am reporting it here.

The behavior can be reproduced using a simple set of random elements. For example, on my machine this code takes 0.08s to serialize and 1.2s to deserialize a HashSet, as reported by cargo run --release:

use std::collections::HashSet;
use std::time::Instant;
use rand::Rng;

fn main() {
    let mut rnd = rand::thread_rng();
    let h: HashSet<u64> = (0..10_000_000).map(|_| rnd.gen::<u64>()).collect();

    let t0 = Instant::now();
    let mut out = vec![];
    bincode::serialize_into(&mut out, &h).unwrap();
    let t1 = Instant::now();
    println!("serialize: {}", (t1 - t0).as_secs_f64());

    let h2: HashSet<u64> = bincode::deserialize_from(&*out).unwrap();
    let t2 = Instant::now();
    println!("deserialize: {}", (t2 - t1).as_secs_f64());

    println!("{}", h2.len());
}

Trivially changing HashSet to rustc_hash::FxHashSet increases deserialization time to 5.5s (almost 5x slower), while serialization is unchanged. In our actual use case the original deserialization takes on the order of 2 minutes, so the slowdown visibly affects our total processing times. Code:

use rustc_hash::FxHashSet;
use std::time::Instant;
use rand::Rng;

fn main() {
    let mut rnd = rand::thread_rng();
    let h: FxHashSet<u64> = (0..10_000_000).map(|_| rnd.gen::<u64>()).collect();

    let t0 = Instant::now();
    let mut out = vec![];
    bincode::serialize_into(&mut out, &h).unwrap();
    let t1 = Instant::now();
    println!("serialize: {}", (t1 - t0).as_secs_f64());

    let h2: FxHashSet<u64> = bincode::deserialize_from(&*out).unwrap();
    let t2 = Instant::now();
    println!("deserialize: {}", (t2 - t1).as_secs_f64());

    println!("{}", h2.len());
}

I thought that perhaps the issue is in pathological behavior when rebuilding a hashmap from elements extracted in the same order, so I modified the code to serialize a Vec and deserialize by building an FxHashSet out of the Vec. That results in slightly slower serialization of 0.12s, but deserializes in just 0.15s, which includes both the time to deserialize the vector and the time to collect it into a new FxHashSet. (Applying this to ordinary HashSet didn't speed it up, it takes 0.11s to serialize and 1.3s to deserialize.) Code:

use rustc_hash::FxHashSet;
use std::time::Instant;
use rand::Rng;

fn main() {
    let mut rnd = rand::thread_rng();
    let h: FxHashSet<u64> = (0..10_000_000).map(|_| rnd.gen::<u64>()).collect();

    let t0 = Instant::now();
    let mut out = vec![];
    let hack = h.iter().copied().collect::<Vec<u64>>();
    bincode::serialize_into(&mut out, &hack).unwrap();
    let t1 = Instant::now();
    println!("serialize: {}", (t1 - t0).as_secs_f64());

    let hack2: Vec<u64> = bincode::deserialize_from(&*out).unwrap();
    let h2: FxHashSet<u64> = hack2.into_iter().collect();
    let t2 = Instant::now();
    println!("deserialize: {}", (t2 - t1).as_secs_f64());

    println!("{}", h2.len());
}

Is this expected behavior for an FxHashMap? Is there a way to fix it without going through a custom (and space-inefficieent) serialization/deserialization?

Note: I have reported the essentially same issue is reported to the fxhash crate, which also provides an FxHashMap with the same deserialization behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions