Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@aklomp @mayeut
Again a draft. Please ignore the Benchmarks patch, I was to far to drop that and rebase against HEAD.
The interesting one is codec: add ssse3_atom.
My experience with CRC32C with Silvermont Atom (SLM) processors is that in 64b certain combinations of instructions incur a penalty (see Intel manuals) making the advantage of running in 64b mode negative in some cases. In later Atoms (Goldmont, Airmont) this penalty likely does not occur, but I don't have the hardware to test. Running base64 on SLM shows strange performance regressions while core i7 shows improvement.
So, I revived the best ssse3 codec as ssse3_atom and tested on Intel Edison (dual core 500MHz) in 64b/32b mode (because that is easy to do) and on Intel NUC with Baytrail Atom in 64b (to show the relevancy on main stream CPU).
Improvement by going back to the revived codec in bold, degradation in italic.
We see that on i7 the latest version is indeed the fastest, on SLM 32 bit there is no difference. But on SLM 64b SSSE3_ATOM is 25% faster.
Now, having a fast algorithm has a much more noticable effect on a slow Atom then on a fast i7... So what do you guys think, should we add a specialized SSSE3 for SLM?