Skip to content

Empty commit, to trigger new RelVals #47708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Mar 27, 2025

PR description:

Empty commit, to trigger new RelVals.

PR validation:

None.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 27, 2025

test parameters:

  • enable = gpu
  • workflows_gpu = 141.044407,141.044483,16850.403,16850.404,17034.403,17034.408,17034.422,17034.424,17050.406,17050.407,17050.408,29834.402,29834.403

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 27, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for master.

It involves the following packages:

  • Empty/Package (****)

The following packages do not have a category, yet:

Empty/Package
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 27, 2025

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bf8465/45227/summary.html
COMMIT: 90adbe2
CMSSW: CMSSW_15_1_X_2025-03-26-2300/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47708/45227/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3909207
  • DQMHistoTests: Total failures: 70
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3909117
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

ROCM Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 27, 2025

@iarspider are there differences between how we run the ROCm tests on LUMI for the pull requests and for the IBs ?

The IB from last night showed that many ROCm tests failed:
image

while the tests in this PR all passed:

12834.402_TTbar_14TeV+2024_Patatrack_PixelOnlyAlpaka Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:17:14 2025-date Thu Mar 27 09:07:23 2025; exit: 0 0 0 0
12834.403_TTbar_14TeV+2024_Patatrack_PixelOnlyAlpaka_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:17:15 2025-date Thu Mar 27 09:07:23 2025; exit: 0 0 0 0
12834.406_TTbar_14TeV+2024_Patatrack_PixelOnlyTripletsAlpaka Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:17:15 2025-date Thu Mar 27 09:07:24 2025; exit: 0 0 0 0
12834.412_TTbar_14TeV+2024_Patatrack_ECALOnlyAlpaka Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:17:14 2025-date Thu Mar 27 09:07:24 2025; exit: 0 0 0 0
12834.422_TTbar_14TeV+2024_Patatrack_HCALOnlyAlpaka_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:17:12 2025-date Thu Mar 27 09:07:25 2025; exit: 0 0 0 0
12834.423_TTbar_14TeV+2024_Patatrack_HCALOnlyGPUandAlpaka_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:17:14 2025-date Thu Mar 27 09:07:25 2025; exit: 0 0 0 0
6 6 6 6 tests passed, 0 0 0 0 failed
141.044407_Run3-2023_JetMET2023D_RecoPixelOnlyTripletsGPU_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:22:48 2025-date Thu Mar 27 09:17:45 2025; exit: 0 0 0 0
141.044483_Run3-2023_JetMET2023D_GPUValidation Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 27 09:22:08 2025-date Thu Mar 27 09:17:46 2025; exit: 0 0 0
16850.403_ZMM_14+2025_Patatrack_PixelOnlyAlpaka_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:23:38 2025-date Thu Mar 27 09:17:46 2025; exit: 0 0 0 0
16850.404_ZMM_14+2025_Patatrack_PixelOnlyAlpaka_Profiling Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 27 09:22:42 2025-date Thu Mar 27 09:17:47 2025; exit: 0 0 0
17034.403_TTbar_14TeV+2025PU_Patatrack_PixelOnlyAlpaka_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 10:01:19 2025-date Thu Mar 27 09:17:47 2025; exit: 0 0 0 0
17034.408_TTbar_14TeV+2025PU_Patatrack_PixelOnlyTripletsAlpaka_Profiling Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 27 09:55:17 2025-date Thu Mar 27 09:17:48 2025; exit: 0 0 0
17034.422_TTbar_14TeV+2025PU_Patatrack_HCALOnlyAlpaka_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 10:00:21 2025-date Thu Mar 27 09:17:48 2025; exit: 0 0 0 0
17034.424_TTbar_14TeV+2025PU_Patatrack_HCALOnlyAlpaka_Profiling Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 27 09:58:13 2025-date Thu Mar 27 09:17:49 2025; exit: 0 0 0
17050.406_ZMM_14+2025PU_Patatrack_PixelOnlyTripletsAlpaka Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 10:00:38 2025-date Thu Mar 27 09:22:08 2025; exit: 0 0 0 0
17050.407_ZMM_14+2025PU_Patatrack_PixelOnlyTripletsAlpaka_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 09:57:19 2025-date Thu Mar 27 09:22:42 2025; exit: 0 0 0 0
17050.408_ZMM_14+2025PU_Patatrack_PixelOnlyTripletsAlpaka_Profiling Step0-PASSED Step1-PASSED Step2-PASSED  - time date Thu Mar 27 09:54:55 2025-date Thu Mar 27 09:22:49 2025; exit: 0 0 0
29834.402_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 10:01:36 2025-date Thu Mar 27 09:23:38 2025; exit: 0 0 0 0
29834.403_TTbar_14TeV+Run4D110PU_Patatrack_PixelOnlyAlpaka_Validation Step0-PASSED Step1-PASSED Step2-PASSED Step3-PASSED  - time date Thu Mar 27 11:33:24 2025-date Thu Mar 27 09:54:56 2025; exit: 0 0 0 0
13 13 13 8 tests passed, 0 0 0 0 failed

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 27, 2025

One difference that I see in the logs is that the IB tests still use all GPUs for each test:

%MSG-i ROCmService:  (NoModuleName) 27-Mar-2025 03:57:12 EET pre-events
ROCm runtime version 6.3.42134, driver version 6.3.42134, AMD driver version 6.3.6
ROCm device 0: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
ROCm device 1: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
ROCm device 2: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
ROCm device 3: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
ROCm device 4: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
ROCm device 5: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
ROCm device 6: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
ROCm device 7: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
%MSG

while the PR tests use a single GPU per job:

%MSG-i ROCmService:  (NoModuleName) 27-Mar-2025 09:21:07 EET pre-events
ROCm runtime version 6.3.42134, driver version 6.3.42134, AMD driver version 6.3.6
ROCm device 0: AMD Instinct MI250X (gfx90a:sramecc+:xnack-)
%MSG

@iarspider
Copy link
Contributor

iarspider commented Mar 27, 2025

I think the main difference is that PR tests are run in single-threaded mode

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 27, 2025

I think the main difference is that PR tests are run in single-threaded mode

But how does that explain why a job would see 1 GPU vs 8 GPUs ?

@smuzaffar
Copy link
Contributor

smuzaffar commented Apr 3, 2025

test parameters:

  • enable = rocm
  • workflow_rocm = 141.044406,141.044408,16850.402,17034.404,17034.408,17050.404,17050.408,29834.404
  • workflow_option_rocm = -t 4

@smuzaffar
Copy link
Contributor

please test

1 similar comment
@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 3, 2025

-1

Failed Tests: RelVals-ROCM
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bf8465/45362/summary.html
COMMIT: 90adbe2
CMSSW: CMSSW_15_1_X_2025-04-03-1100/el8_amd64_gcc12
Additional Tests: ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47708/45362/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 5 lines from the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3913985
  • DQMHistoTests: Total failures: 52
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3913913
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Jun 22, 2025

please test

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #47708 was updated.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-ROCM
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bf8465/46868/summary.html
COMMIT: 0756b8f
CMSSW: CMSSW_15_1_X_2025-06-22-0000/el8_amd64_gcc12
Additional Tests: ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47708/46868/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-ROCM

  • 141.044408141.044408_Run3-2023_JetMET2023D_RecoPixelOnlyTripletsGPU_Profiling/step3_Run3-2023_JetMET2023D_RecoPixelOnlyTripletsGPU_Profiling.log
  • 16850.40216850.402_ZMM_14+2025_Patatrack_PixelOnlyAlpaka/step3_ZMM_14+2025_Patatrack_PixelOnlyAlpaka.log
  • 17034.40417034.404_TTbar_14TeV+2025PU_Patatrack_PixelOnlyAlpaka_Profiling/step3_TTbar_14TeV+2025PU_Patatrack_PixelOnlyAlpaka_Profiling.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 12 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4058157
  • DQMHistoTests: Total failures: 9
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4058128
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Jul 4, 2025

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 4, 2025

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 4, 2025

Pull request #47708 was updated.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 4, 2025

-1

Failed Tests: RelVals-ROCM
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bf8465/47079/summary.html
COMMIT: 0c7fa9c
CMSSW: CMSSW_15_1_X_2025-07-03-2300/el8_amd64_gcc12
Additional Tests: ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47708/47079/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bf8465/47079/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bf8465/47079/git-merge-result

RelVals-ROCM

  • 17034.41217034.412_TTbar_14TeV+2025PU_Patatrack_ECALOnlyAlpaka/step2_TTbar_14TeV+2025PU_Patatrack_ECALOnlyAlpaka.log
  • 17034.40617034.406_TTbar_14TeV+2025PU_Patatrack_PixelOnlyTripletsAlpaka/step3_TTbar_14TeV+2025PU_Patatrack_PixelOnlyTripletsAlpaka.log
  • 17034.40417034.404_TTbar_14TeV+2025PU_Patatrack_PixelOnlyAlpaka_Profiling/step3_TTbar_14TeV+2025PU_Patatrack_PixelOnlyAlpaka_Profiling.log

Comparison Summary

Summary:

  • You potentially removed 98 lines from the logs
  • Reco comparison results: 62695 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4067195
  • DQMHistoTests: Total failures: 561287
  • DQMHistoTests: Total nulls: 410
  • DQMHistoTests: Total successes: 3505478
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 6.096 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): -0.117 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 13034.0 ): 2.651 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 17034.0 ): 2.583 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): 0.133 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): 0.846 KiB SiStrip/MechanicalView
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: found differences in 15 / 48 workflows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants