Skip to content

Integrating PyTorch in Alpaka heterogeneous core #47984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lukaszmichalskii
Copy link

PR description:

This PR enables seamless integration between PyTorch and the Alpaka-based heterogeneous computing backend, supporting inference workflows with usage of pytorch library with PortableCollections objects. It provides:

  • Compatibility with Alpaka device/queue abstractions.
  • Support for automatic conversion of optimized SoA to torch tensors, with memory blobs reusage.
  • Support for both just-in-time (JIT) and ahead-of-time (AOT) model execution (Beta version for AOT).
  • Single-threading and CUDA stream management are handled by Guard objects specialized for each supported backend.

This implementation was presented and discussed at Core Software Meeting: https://indico.cern.ch/event/1538634/

PR validation:

Included demonstration code of interoperability between SoA constructs with PyTorch C++ API and CMSSW environment in plugins and test packages.

PyTorch Ahead-of-time compilation

This pull request also investigates AOT compilation strategy but is in beta version (proof of concept) not yet ready for production usage.

GPU support

CUDA backend is supported and tested, ROCm is not yet supported: cms-sw/cmsdist#9786

FYI @valsdav @ericcano @felicepantaleo @chrisizeh @leobeltra

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 30, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47984/44654

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47984/44655

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @lukaszmichalskii for master.

It involves the following packages:

  • DataFormats/PyTorchTest (****)
  • PhysicsTools/PyTorch (ml)

The following packages do not have a category, yet:

DataFormats/PyTorchTest
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild, @valsdav, @y19y19 can you please review it and eventually sign? Thanks.
@missirol, @mmusich, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@lukaszmichalskii lukaszmichalskii changed the title Integrating PyTorch in Alpakac heterogeneous core Integrating PyTorch in Alpaka heterogeneous core Apr 30, 2025
@valsdav
Copy link
Contributor

valsdav commented May 1, 2025

enable gpu

@valsdav
Copy link
Contributor

valsdav commented May 7, 2025

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented May 7, 2025

-1

Failed Tests: Build ClangBuild
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c68a5d/45909/summary.html
COMMIT: c01d07f
CMSSW: CMSSW_15_1_X_2025-05-06-2300/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47984/45909/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

>> Compiling  src/PhysicsTools/PyTorch/test/testModel.cc
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/c++ -c -DCMS_MICRO_ARCH='x86-64-v3' -DGNU_GCC -D_GNU_SOURCE -DCMSSW_GIT_HASH='CMSSW_15_1_X_2025-05-06-2300' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_15_1_X_2025-05-06-2300' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-06-2300/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/pytorch/2.6.0-a6d0e4413a9e766b40a2b79f83b4b176/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/pytorch/2.6.0-a6d0e4413a9e766b40a2b79f83b4b176/include/torch/csrc/api/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/cppunit/1.15.x-25a760f1303b0fca73df75b14e1358bc/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/cuda/12.8.1-f1c01abd08373a07ceeffab8d5f1930a/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/protobuf/3.21.9-1126508a53768c90e66f6bf1821ac03a/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/zlib/1.2.13-d217cdbdd8d586e845e05946de2796be/include -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc12/src/PhysicsTools/PyTorch/test/testModel/testModel.cc.d src/PhysicsTools/PyTorch/test/testModel.cc -o tmp/el8_amd64_gcc12/src/PhysicsTools/PyTorch/test/testModel/testModel.cc.o
>> Compiling  src/PhysicsTools/PyTorch/test/testRunner.cc
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/bin/c++ -c -DCMS_MICRO_ARCH='x86-64-v3' -DGNU_GCC -D_GNU_SOURCE -DCMSSW_GIT_HASH='CMSSW_15_1_X_2025-05-06-2300' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_15_1_X_2025-05-06-2300' -Isrc -Ipoison -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-05-06-2300/src -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/pytorch/2.6.0-a6d0e4413a9e766b40a2b79f83b4b176/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/pytorch/2.6.0-a6d0e4413a9e766b40a2b79f83b4b176/include/torch/csrc/api/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/cppunit/1.15.x-25a760f1303b0fca73df75b14e1358bc/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/cuda/12.8.1-f1c01abd08373a07ceeffab8d5f1930a/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/protobuf/3.21.9-1126508a53768c90e66f6bf1821ac03a/include -I/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02888/el8_amd64_gcc12/external/zlib/1.2.13-d217cdbdd8d586e845e05946de2796be/include -O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++20 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -Wno-error=array-bounds -Warray-bounds -fuse-ld=bfd -march=x86-64-v3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -DBOOST_DISABLE_ASSERTS -flto=auto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc12/src/PhysicsTools/PyTorch/test/testModel/testRunner.cc.d src/PhysicsTools/PyTorch/test/testRunner.cc -o tmp/el8_amd64_gcc12/src/PhysicsTools/PyTorch/test/testModel/testRunner.cc.o
In file included from src/PhysicsTools/PyTorch/test/testModel.cc:5:
src/PhysicsTools/PyTorch/test/testUtilities.h:4:10: fatal error: boost/filesystem.hpp: No such file or directory
    4 | #include 
      |          ^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
gmake: *** [tmp/el8_amd64_gcc12/src/PhysicsTools/PyTorch/test/testModel/testModel.cc.o] Error 1
>> Building binary testModel


Clang Build

I found compilation error while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' /usr/bin/time -v scram build -k -j 32 COMPILER='llvm compile'

>> Local Products Rules ..... done
>> Creating project symlinks
>> Entering Package PhysicsTools/PyTorch
>> Entering Package DataFormats/PyTorchTest
>> Compile sequence completed for CMSSW CMSSW_15_1_X_2025-05-06-2300
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1
Command exited with non-zero status 1
	Command being timed: "scram build -k -j 32 COMPILER=llvm compile BUILD_LOG=yes"
	User time (seconds): 907.23
	System time (seconds): 91.81
	Percent of CPU this job got: 655%


@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 6, 2025

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 6, 2025

Pull request #47984 was updated. @cmsbuild, @valsdav, @y19y19 can you please check and sign again.

@valsdav
Copy link
Contributor

valsdav commented Jun 6, 2025

please test

@valsdav
Copy link
Contributor

valsdav commented Jun 20, 2025

please abort

@valsdav
Copy link
Contributor

valsdav commented Jun 20, 2025

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c68a5d/46856/summary.html
COMMIT: cd3feb9
CMSSW: CMSSW_15_1_X_2025-06-20-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47984/46856/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 1 lines from the logs
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4058157
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4058132
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53212
  • DQMHistoTests: Total failures: 28
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 53184
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

ROCM Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 47 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53212
  • DQMHistoTests: Total failures: 4059
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 49153
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files

@valsdav
Copy link
Contributor

valsdav commented Jun 23, 2025

+ml

Signing for ML, but we would like to get some feedback about this from @cms-sw/core-team. Thanks in advance!

@makortel
Copy link
Contributor

makortel commented Jul 2, 2025

but we would like to get some feedback about this from @cms-sw/core-team. Thanks in advance!

I'll take a look in the coming days. In the future please use @cms-sw/core-l2 team.

@makortel
Copy link
Contributor

makortel commented Jul 3, 2025

FYI @cms-sw/heterogeneous-l2

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 3, 2025

Pull request has been put on hold by @fwyzard
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

@cmsbuild cmsbuild added the hold label Jul 3, 2025
@fwyzard
Copy link
Contributor

fwyzard commented Jul 3, 2025

I would like to give feedback from the @cms-sw/heterogeneous-l2 and alpaka point of view before this proceeds.

In fact, I would suggest presenting the work at one of the upcoming GPU developments meeting ?

@fwyzard
Copy link
Contributor

fwyzard commented Jul 3, 2025

assign heterogeneous

@cmsbuild cmsbuild removed the hold label Jul 3, 2025
@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 3, 2025

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From first read through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files in this new package do not seem to technically depend on PyTorch. The dependencies are thus the same as in DataFormats/PortableTestObjects (except PortableTestObjects depends also on eigen).

I'd suggest to consider moving the test data types to DataFormats/PortableTestObjects (I'm not against a separate package, but I'd want the rationale for a separate package to be documented in the PR discussion)

@@ -0,0 +1,20 @@
#ifndef DATA_FORMATS__PYTORCH_TEST__INTERFACE__DEVICE_H_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferred format would be

Suggested change
#ifndef DATA_FORMATS__PYTORCH_TEST__INTERFACE__DEVICE_H_
#ifndef DataFormats_PyTorchTest_interface_Device_h

(see 4.1 in https://cms-sw.github.io/cms_coding_rules.html#4--technical-coding-rules-1)

(also in other files)

#include "DataFormats/Portable/interface/PortableDeviceCollection.h"
#include "DataFormats/PyTorchTest/interface/Layout.h"

namespace torchportable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to include test in the namespace name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the file name Device.h confusing, because the file does not define or declare any device. I'd suggest to rename e.g. along PyTorchTestDeviceCollections.h. Same for Host.h, Layout.h, and alpaka/Collections.h.

See also DataFormats/PortableTestObjects/interface (and DataFormats/TestObjects/interface) for established practice.

Comment on lines +20 to +25
using ::torchportable::ClassificationCollectionDevice;
using ::torchportable::ClassificationCollectionHost;
using ::torchportable::ParticleCollectionDevice;
using ::torchportable::ParticleCollectionHost;
using ::torchportable::RegressionCollectionDevice;
using ::torchportable::RegressionCollectionHost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shouldn't be needed

Suggested change
using ::torchportable::ClassificationCollectionDevice;
using ::torchportable::ClassificationCollectionHost;
using ::torchportable::ParticleCollectionDevice;
using ::torchportable::ParticleCollectionHost;
using ::torchportable::RegressionCollectionDevice;
using ::torchportable::RegressionCollectionHost;

Comment on lines +112 to +114
process.schedule = cms.Schedule(
process.path
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary. In absence of process.schedule the framework will run all Paths.

Suggested change
process.schedule = cms.Schedule(
process.path
)


function die { echo Failed $1: status $2 ; exit $2 ; }

SCRIPT="PhysicsTools/PyTorch/test/testPipeline.py"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some other reason to have this testPipelineStandalone.sh than avoiding the ${LOCALTOP}/src part in the testPipeline.sh?

<bin name="testTorch" file="testTorch.cc">
<iftool name="cuda-gcc-support">
<bin name="testTensorStride" file="testRunner.cc,testTensorStride.cu">
<use name="catch2"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catch2 doesn't seem to be used

(same for the other test executables below)

Comment on lines +134 to +140
<use name="FWCore/ParameterSet"/>
<use name="FWCore/ParameterSetReader"/>
<use name="FWCore/PluginManager"/>
<use name="FWCore/ServiceRegistry"/>
<use name="FWCore/Utilities"/>
<use name="HeterogeneousCore/CUDAServices"/>
<use name="HeterogeneousCore/CUDAUtilities"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of these, only CUDAUtilities seem to be used

Suggested change
<use name="FWCore/ParameterSet"/>
<use name="FWCore/ParameterSetReader"/>
<use name="FWCore/PluginManager"/>
<use name="FWCore/ServiceRegistry"/>
<use name="FWCore/Utilities"/>
<use name="HeterogeneousCore/CUDAServices"/>
<use name="HeterogeneousCore/CUDAUtilities"/>
<use name="HeterogeneousCore/CUDAUtilities"/>

(same for the other test executables below)

Comment on lines +177 to +178
<test name="TestPipelineCpu" command="testPipeline.sh cpu"/>
<test name="TestPipelineCuda" command="testPipeline.sh cuda">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to have more specific names for these tests, e.g.

Suggested change
<test name="TestPipelineCpu" command="testPipeline.sh cpu"/>
<test name="TestPipelineCuda" command="testPipeline.sh cuda">
<test name="TestPyTorchPipelineCpu" command="testPipeline.sh cpu"/>
<test name="TestPyTorchPipelineCuda" command="testPipeline.sh cuda">

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants