Closed
Description
Recently we started using spotless in openHAB (http://github.com/openhab/openhab-addons). We are currently not able to do a full build (around 260 sub-projects) with checks enabled due to resource issues. It seems that this does not depend on the OS as it shows on our Unix-based CI and also on local builds under Windows.
- [3.6.1] maven version
- [1.30.0] spotless version
- [4.13.0] eclipse version
A first analysis can be found here: openhab/openhab-addons#7449 (comment). Looking at a heap dump it seems that the SpotlessCache
is instantiated with a new FeatureClassloader
each time a check is called but never cleared.
The pom including the configuration can be found here: https://github.com/openhab/openhab-addons/blob/2.5.x/pom.xml
Activity
nedtwigg commentedon Apr 28, 2020
Interesting, thanks for the detailed profiling! It should be the case that the classloaders are cached across projects, so there should only be a few created across the entire build. You get a big speedup from the JIT by reusing these classloaders as much as you can - for the gradle plugin we only ever clear them when someone runs
clean
, otherwise they get reused by the daemon.That is really surprising to me!! To my knowledge we don't manually spawn a single thread. My best guess at what's happening is:
Independent of that, I'm also suspicious that if a maven plugin declares a
static
variable, it might not be shared across maven subprojects. If not, we have nothing to gain from our cache, and can easily disable it for maven builds.I know it's been a long time since you worked on this, but any thoughts @lutovich?
wborn commentedon Apr 28, 2020
Maven will create different classloaders and reuse them depending on the (plugin) configuration used in Maven projects. In parallel Maven builds (
-T 1C
) it may also create several classloaders for the same configuration so threads can use them in parallel. So using static variables for caching and locking usually doesn't work well with Maven.lutovich commentedon Apr 28, 2020
Hey!
Yeah, it feels like with parallel builds and multiple classloaders
SpotlessCache.instance
can be instantiated more than once and cause problems. I can investigate this issue closer to the end of this week if that's not too late :)J-N-K commentedon Apr 28, 2020
Thanks alot. Since it took us several month to get to the point where we are now, a week or two doesn't matter. We can always apply the formatter to single projects only which is not a problem.
lutovich commentedon May 3, 2020
Seems like this problem is caused by the way
SpotlessCache
keys are created and is specific to Eclipse-based steps. Each key a serialized instance ofEclipseBasedStepBuilder.State
and includes a list of settings files. This list is set by every Ecpilse-based step usingEclipseBasedStepBuilder#setPreferences()
. Steps locate settings files usingFileLocator
. Inconveniently,FileLocator
does not just return the same file for every Maven module (e.g.openhab_codestyle.xml
). Instead, it copies the configured file into thetarget
directory of every module and returns this copied temporary file. This makes allSpotlessCache
keys constructed for every module unique and results in zero reuse ofFeatureClassLoader
s. For every Eclipse-based step the createdFeatureClassLoader
loads a newSpotlessEclipseFramework
with its ownorg.eclipse.core.internal.jobs.JobManager
thread pool and a bunch of other helper threads. It looks like the number of active threads increases with every processed module and eventually brings the whole build to a halt.Running spotless with Google code formatter works fine and the problem manifests once I re-add at least one Eclipse-based formatter.
Here is my investigation branch https://github.com/lutovich/spotless/tree/fix-spotless-cache-leak with some printouts and a hack that makes Eclipse Java formatter step pass when building openhab-addons. The hack is a very dirty one and a real fix would require some more effort.
An idea for a fix: instead of using the "physical" settings files for cache keys use "logical" files. For example, every cache key for Eclipse Java format step would use the configured
openhab_codestyle.xml
instead of a copied temporary per-module XML file.Any ideas or thoughts about this are greatly appreciated!
nedtwigg commentedon May 4, 2020
Aha! We had a similar problem (very different symptom though) when adding tsfmt support to maven-spotless (#553). The problem there is that
package.json
andtsconfig.json
need to be in the project root, and we expected them to be there, but they were getting copied into a temp directory with a random name. We fixed it by addingpublic File locateLocal(String path)
, which doesn't copy the file, to go along withpublic File locateFile(String path)
, which does copy the file.https://github.com/diffplug/spotless/pull/553/files#diff-fb2ef4a51d4ea619ed63830522709278
I see now in my notes that I left myself a todo, which was:
File locateFile(String path)
, which copies the fileFile locateLocal(String path)
which doesn't copy itBut it never got to the top of my todo :). So whaddya think - can we replace every call to
locateFile
withlocateLocal
? Why not? Fwiw, the gradle plugin never copies settings files.lutovich commentedon May 4, 2020
FileLocator#locateFile()
is quite powerful and can load the given file from the local file system, URL, or extract from a JAR file. I think JAR file could be especially useful for multi-module Maven projects where Spotless settings file lives in a dedicated module. URL could be useful when the settings file lives on GitHub, even though it is probably a bit weird to download setting files during every build.Spotbugs and Checkstyle Maven plugins seem to also use
ResourceManager
for loading configuration files:I remember looking at them for inspiration :)
Do you think it is possible to use actual file content in
FileSignature
instead of the combination of file names, sizes, and last modified timestamps? POC commit link. I think this could solve the given issue, though I haven't verified it :)lutovich commentedon May 4, 2020
Added another commit to that branch to handle hashing for files and directories differently: diff. Input is much appreciated!
nedtwigg commentedon May 4, 2020
We can definitely change
FileSignature
. We actually have an open issue to make it machine-independent for the gradle build cache (no absolute paths or file timestamps) #566With that in mind, a few points on the diff:
signAsList
vssignAsSet
, as you noted)FileSignature.signAsSet
uses the natural ordering ofFile
, which sorts on path. If the files are going to have random filenames, then we need to sort on their hash or something else.lutovich commentedon May 10, 2020
Making Maven plugin use human-readable names for output files would be nice. I'm not sure how to do this nicely though. Creating a name for the output file is easy when a step is configured with a local file:
then we could just name the output file
eclipse-fmt.xml
.However, it could be a bit tricky when a step is configured with a URL:
where URLs might have redirects or query params.
I might be overthinking it, any ideas are appreciated!
Please take a look at the attached PR. It makes output file names predictable but not human-readable. Thus it is possible to sort on file names. This PR makes it possible to run
mvn spotless:apply -T 2C
locally onopenhab-addons
project and the number of threads remains constant.11 remaining items