Description
One of the compiler performance issues I found in a codebase involved a big session of type inference + LUBbing in code creating a Map[A, Seq[Class[_ <: B]]
, in which there was enough of a lattice in the actual subtypes of B
to make LUB computation expensive.
I distilled this into a benchmark corpus:
That takes about 2s to compile. The original example was 10x that.
Here’s how the profiler sees things:
https://scala-ci.typesafe.com/view/scala-bench/job/compiler-benchmark/2101/artifact/target/profile-jfr/flame-graph-cpu.svg
https://scala-ci.typesafe.com/view/scala-bench/job/compiler-benchmark/2101/artifact/target/profile-jfr/flame-graph-cpu-reverse.svg
I didn’t go as far as characterizing the big-O complexity wrt different N in the example. My working assumption is that primary N is the list of the Seq.apply
argument list, and I imagine its O(N^2)
.
The constant factor multiplied by that complexity is existential subtyping, which creates temporary types by substituting type vars in place of existential quantifiers. These create some pressure on the uniques cache of types.
I once tried the Dotty approach of excluding types we know to be temporary from the uniques cache: scala/scala@2.13.x...retronym:faster/strong-selective-unique
We still have to allocate to create the type, but by avoiding keeping the weak reference from the uniques map to the short-lived type, we make it eligible for GC almost immediately. I never saw enough benefit from this in big benchmarks to make me follow through with it, but maybe with a focussed benchmark like this one we could measure the improvement and justify the change.
Activity
retronym commentedon Nov 13, 2018
@mkeskells noticed that the performance is "less bad" in 2.13.x.
I expanded the example a little to: https://gist.github.com/6d4cfa02b0b170e4a6c3a73db75167cc
And then git bisected between 2.12.7 and 2.13.x. At each step, I ran:
Taking "bad" == "fast" and "good" == "slow".
Cold performance changed from 25s to 12s with this library change: