Skip to content

Migrate LST outputs to Portable SoAs #48409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ariostas
Copy link
Contributor

This PR refactors the LST outputs so that Portable SoAs are used as much as possible. The output of the LST Producer is now a device collection, and the framework takes care of copying it to the host.

This continues the work from #47793, now on the outputs side, and completes the tasks related to CMSSW-LST interfacing in #46746.

c.c. @slava77

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 25, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45307

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @ariostas for master.

It involves the following packages:

  • DataFormats/Common (core)
  • RecoTracker/LST (reconstruction)
  • RecoTracker/LSTCore (reconstruction)

@Dr15Jones, @cmsbuild, @jfernan2, @makortel, @mandrenguyen, @smuzaffar can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmusich, @mtosi, @rovere, @wddgit this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jun 25, 2025

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 29634.704,29834.704
  • workflows = 29634.703,29834.703,29834.755,29634.757,29834.757
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

@slava77
Copy link
Contributor

slava77 commented Jun 25, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build HeaderConsistency
Size: This PR adds an extra 64KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c9102/46917/summary.html
COMMIT: 21a525e
CMSSW: CMSSW_15_1_X_2025-06-25-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46917/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

Copying tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a to productstore area:
Copying tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a to productstore area:
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a': No such file or directory
cp: cannot stat 'tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a': No such file or directory
>> Deleted: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/libRecoTrackerLSTCoreCudaAsync_nv.a] Error 1
>> Deleted: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a
gmake: *** [config/SCRAM/GMake/Makefile.rules:1920: tmp/el8_amd64_gcc12/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreROCmAsync/libRecoTrackerLSTCoreROCmAsync_rocm.a] Error 1
@@@@ Checking for missing symbols was SKIPPED due to NO_LIB_CHECKING flag in BuildFile: libUtilitiesStaticAnalyzers.so
Unknow target lib/el8_amd64_gcc12/RecoTrackerLSTCore_xr.rootmap
Unknow target lib/el8_amd64_gcc12/RecoTrackerLST_xr.rootmap


@ariostas
Copy link
Contributor Author

Sorry about that, that header was deleted by mistake.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45311

@slava77
Copy link
Contributor

slava77 commented Jun 26, 2025

@slava77 Matti is on vacation and will not be back till the 2nd week of July.

OK, summer time.

In the meantime (while we wait for the main reviewer(s)) it would be nice to get advice/clarification on the plugin issue
#48409 (comment)

@Dr15Jones
Copy link
Contributor

For the other failures, could they be false positives? The LSTOutputConverter plugin was moved from plugins to plugins/alpaka, but kept the same name. So could it be finding the poisoned one first? Otherwise, I'm not sure what would need to be updated.

The plugin system used the LD_LIBRARY_PATH environment variable to decide the order in which to look for plugins (this follows how the OS looks for shared libraries). The local work environment's lib directory should be in LD_LIBRARY_PATH before the local work environment's poison directory therefore the new plugin should be found first.

Are you certain the new plugin is actually being built?

@smuzaffar any thoughts?

@Dr15Jones
Copy link
Contributor

So I see that the following libraries are built

  • libRecoTrackerLSTPluginsPortableCudaAsync.so
  • libRecoTrackerLSTPluginsPortableROCmAsync.so
  • libRecoTrackerLSTPluginsPortableSerialSync.so

neither of these are an exact match for plugin-poisoned-RecoTrackerLSTPlugins.so as previously, I believe, the LSTOutputConverter was just in the plugins directory which would match the _poisoned` name.

In the conversion to an alpaka module, the module's type name would have been changed as well. It would no longer just be LSTOutputConverter. So in the configuration for these jobs, how is the type of the module specified?

@Dr15Jones
Copy link
Contributor

So you need to change

_hltInitialStepTrackCandidatesLST = cms.EDProducer('LSTOutputConverter',

to

_hltInitialStepTrackCandidatesLST = cms.EDProducer('LSTOutputConverter@alpaka', 

and do that for all configurations.

@slava77
Copy link
Contributor

slava77 commented Jun 26, 2025

So you need to change

Thank you, Chris. The issue is on our side after all.

Do you think there is a way to improve the error message to indicate that the plugin LSTOutputConverter does not exist? (I can't recall what the error message is if it never existed; if that one is more clear, perhaps it can be added as a possibility)

@ariostas
Copy link
Contributor Author

Thanks for the help, Chris! Should be all working now.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48409/45328

@cmsbuild
Copy link
Contributor

Pull request #48409 was updated. @Dr15Jones, @Martin-Grunewald, @cmsbuild, @fwyzard, @jfernan2, @makortel, @mandrenguyen, @mmusich, @smuzaffar can you please check and sign again.

@slava77
Copy link
Contributor

slava77 commented Jun 26, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 40KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3c9102/46936/summary.html
COMMIT: 83731f0
CMSSW: CMSSW_15_1_X_2025-06-26-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/48409/46936/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 243 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 117626
  • DQMHistoTests: Total failures: 4307
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 113319
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
  • Checked 32 log files, 36 edm output root files, 9 DQM output files
  • TriggerResults: no differences found

ROCM Comparison Summary

Summary:

  • You potentially added 6 lines to the logs
  • Reco comparison results: 92 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 117626
  • DQMHistoTests: Total failures: 8904
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 108722
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
  • Checked 32 log files, 36 edm output root files, 9 DQM output files

@mmusich
Copy link
Contributor

mmusich commented Jun 26, 2025

+hlt

  • changes in HLTrigger packages are anyway minimal and linked to the discussion above.

@jfernan2
Copy link
Contributor

+1


namespace ALPAKA_ACCELERATOR_NAMESPACE {

class LSTOutputConverter : public global::EDProducer<> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What motivated moving the LSTOutputConverter from a "regular module" (in plugins) to an "alpaka module" (in plugins/alpaka and in ALPAKA_ACCELERATOR_NAMESPACE)?

By quick look I didn't see any backend-specific behavior, but maybe I just missed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants