Skip to content

I suggest doing speed/size comparison with other dedup. solutions #114

Open
@safinaskar

Description

@safinaskar

I suggest comparing your solution with other deduplication solutions, such as borg, casync, desync, rdedup. Compare not only size of deduplicated and compressed data, but also speed of creating deduplicated image and speed of reading data back. Here is my own benchmark, which compares many solutions: borgbackup/borg#7674 (comment) . I recommend reading whole discussion, it contains many ideas. My conclusion is so: existing solutions are very inefficient, every one contains some inefficiency. Either it is not parallel, either it is written in Go instead of C/C++/Rust and thus slow, either it has some another problem. Based on my experience, I conclude that one can easily create deduplicating solution, which will beat in terms of size and speed all others.

Here is summary of my critique of other solutions: https://lobste.rs/s/0itosu/look_at_rapidcdc_quickcdc#c_ygqxsl .

I didn't study puzzlefs closely. But from what I see I already see one inefficiency: puzzlefs uses sha256, which is on many machines (including mine) slower than blake2 and blake3.

I don't plan adding puzzlefs to my comparison borgbackup/borg#7674 (comment) . I already deleted test data and test programs.

You may say that all this (i. e. speed) is not important. I disagree. I had one particular use case: "docker, but for VMs". I compared existing solutions, and all them was unsatisfactory. So I developed my own: azwyon (in Rust): borgbackup/borg#7674 (comment) . Azwyon can store and extract 10 GiB virtual machines in several seconds on my machine. This is satisfactory for me. And this result is unachievable with other solutions. (Keep in mind that azwyon uses fixed-size chunking.) Similarly, it is possible that someone will evaluate puzzlefs, concludes that it is slow and rejects it.

Please, understand me correctly. I don't say that puzzlefs is slow! I didn't test it, so I don't know. I just want to share my experience. My message is so: "Many other solutions are slower than they may be, so it is possible that puzzlefs is slow, too, so I suggest properly testing it, and, if it actually turns out to be slow, fix it using ideas from these links"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions