Skip to content

Resource leak when using Maven plugin #559

Closed
@J-N-K

Description

@J-N-K
Contributor

Recently we started using spotless in openHAB (http://github.com/openhab/openhab-addons). We are currently not able to do a full build (around 260 sub-projects) with checks enabled due to resource issues. It seems that this does not depend on the OS as it shows on our Unix-based CI and also on local builds under Windows.

  • [3.6.1] maven version
  • [1.30.0] spotless version
  • [4.13.0] eclipse version

A first analysis can be found here: openhab/openhab-addons#7449 (comment). Looking at a heap dump it seems that the SpotlessCache is instantiated with a new FeatureClassloader each time a check is called but never cleared.

The pom including the configuration can be found here: https://github.com/openhab/openhab-addons/blob/2.5.x/pom.xml

Activity

nedtwigg

nedtwigg commented on Apr 28, 2020

@nedtwigg
Member

Interesting, thanks for the detailed profiling! It should be the case that the classloaders are cached across projects, so there should only be a few created across the entire build. You get a big speedup from the JIT by reusing these classloaders as much as you can - for the gradle plugin we only ever clear them when someone runs clean, otherwise they get reused by the daemon.

the spotless Plugin created around 18 daemon threads per project

That is really surprising to me!! To my knowledge we don't manually spawn a single thread. My best guess at what's happening is:

  • maven spawns threads which it uses to call plugins from
  • spotless is doing something which prevents those threads from shutting down

Independent of that, I'm also suspicious that if a maven plugin declares a static variable, it might not be shared across maven subprojects. If not, we have nothing to gain from our cache, and can easily disable it for maven builds.

I know it's been a long time since you worked on this, but any thoughts @lutovich?

wborn

wborn commented on Apr 28, 2020

@wborn

I'm also suspicious that if a maven plugin declares a static variable, it might not be shared across maven subprojects. If not, we have nothing to gain from our cache, and can easily disable it for maven builds.

Maven will create different classloaders and reuse them depending on the (plugin) configuration used in Maven projects. In parallel Maven builds (-T 1C) it may also create several classloaders for the same configuration so threads can use them in parallel. So using static variables for caching and locking usually doesn't work well with Maven.

lutovich

lutovich commented on Apr 28, 2020

@lutovich
Contributor

Hey!

Yeah, it feels like with parallel builds and multiple classloaders SpotlessCache.instance can be instantiated more than once and cause problems. I can investigate this issue closer to the end of this week if that's not too late :)

J-N-K

J-N-K commented on Apr 28, 2020

@J-N-K
ContributorAuthor

I can investigate this issue closer to the end of this week if that's not too late :)

Thanks alot. Since it took us several month to get to the point where we are now, a week or two doesn't matter. We can always apply the formatter to single projects only which is not a problem.

lutovich

lutovich commented on May 3, 2020

@lutovich
Contributor

Seems like this problem is caused by the way SpotlessCache keys are created and is specific to Eclipse-based steps. Each key a serialized instance of EclipseBasedStepBuilder.State and includes a list of settings files. This list is set by every Ecpilse-based step using EclipseBasedStepBuilder#setPreferences(). Steps locate settings files using FileLocator. Inconveniently, FileLocator does not just return the same file for every Maven module (e.g. openhab_codestyle.xml). Instead, it copies the configured file into the target directory of every module and returns this copied temporary file. This makes all SpotlessCache keys constructed for every module unique and results in zero reuse of FeatureClassLoaders. For every Eclipse-based step the created FeatureClassLoader loads a new SpotlessEclipseFramework with its own org.eclipse.core.internal.jobs.JobManager thread pool and a bunch of other helper threads. It looks like the number of active threads increases with every processed module and eventually brings the whole build to a halt.

Running spotless with Google code formatter works fine and the problem manifests once I re-add at least one Eclipse-based formatter.

Here is my investigation branch https://github.com/lutovich/spotless/tree/fix-spotless-cache-leak with some printouts and a hack that makes Eclipse Java formatter step pass when building openhab-addons. The hack is a very dirty one and a real fix would require some more effort.

An idea for a fix: instead of using the "physical" settings files for cache keys use "logical" files. For example, every cache key for Eclipse Java format step would use the configured openhab_codestyle.xml instead of a copied temporary per-module XML file.

Any ideas or thoughts about this are greatly appreciated!

nedtwigg

nedtwigg commented on May 4, 2020

@nedtwigg
Member

Aha! We had a similar problem (very different symptom though) when adding tsfmt support to maven-spotless (#553). The problem there is that package.json and tsconfig.json need to be in the project root, and we expected them to be there, but they were getting copied into a temp directory with a random name. We fixed it by adding public File locateLocal(String path), which doesn't copy the file, to go along with public File locateFile(String path), which does copy the file.

https://github.com/diffplug/spotless/pull/553/files#diff-fb2ef4a51d4ea619ed63830522709278

I see now in my notes that I left myself a todo, which was:

  • I don't see why we can't just delete File locateFile(String path), which copies the file
  • and replace every invocation of it with File locateLocal(String path) which doesn't copy it
  • but there must be some reason we're copying, so I should ask first

But it never got to the top of my todo :). So whaddya think - can we replace every call to locateFile with locateLocal? Why not? Fwiw, the gradle plugin never copies settings files.

lutovich

lutovich commented on May 4, 2020

@lutovich
Contributor

FileLocator#locateFile() is quite powerful and can load the given file from the local file system, URL, or extract from a JAR file. I think JAR file could be especially useful for multi-module Maven projects where Spotless settings file lives in a dedicated module. URL could be useful when the settings file lives on GitHub, even though it is probably a bit weird to download setting files during every build.

Spotbugs and Checkstyle Maven plugins seem to also use ResourceManager for loading configuration files:

I remember looking at them for inspiration :)

Do you think it is possible to use actual file content in FileSignature instead of the combination of file names, sizes, and last modified timestamps? POC commit link. I think this could solve the given issue, though I haven't verified it :)

lutovich

lutovich commented on May 4, 2020

@lutovich
Contributor

Added another commit to that branch to handle hashing for files and directories differently: diff. Input is much appreciated!

nedtwigg

nedtwigg commented on May 4, 2020

@nedtwigg
Member

We can definitely change FileSignature. We actually have an open issue to make it machine-independent for the gradle build cache (no absolute paths or file timestamps) #566

With that in mind, a few points on the diff:

  1. The FileSignature constructor does care about order (e.g. classpath order matters, thus signAsList vs signAsSet, as you noted)
  2. FileSignature.signAsSet uses the natural ordering of File, which sorts on path. If the files are going to have random filenames, then we need to sort on their hash or something else.
  3. Copying resources to build dir is okay, but can they at least have the same filename? Debugging is easier if the filenames stay human-readable. Also would allow us to sort by filename rather than hash.
lutovich

lutovich commented on May 10, 2020

@lutovich
Contributor

Making Maven plugin use human-readable names for output files would be nice. I'm not sure how to do this nicely though. Creating a name for the output file is easy when a step is configured with a local file:

<eclipse>
   <file>${basedir}/eclipse-fmt.xml</file>
</eclipse>

then we could just name the output file eclipse-fmt.xml.

However, it could be a bit tricky when a step is configured with a URL:

<eclipse>
   <file>http://my-site.com/eclipse-fmt.xml</file>
</eclipse>

where URLs might have redirects or query params.

I might be overthinking it, any ideas are appreciated!

Please take a look at the attached PR. It makes output file names predictable but not human-readable. Thus it is possible to sort on file names. This PR makes it possible to run mvn spotless:apply -T 2C locally on openhab-addons project and the number of threads remains constant.

11 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @lutovich@nedtwigg@J-N-K@wborn

      Issue actions

        Resource leak when using Maven plugin · Issue #559 · diffplug/spotless