- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 206
Open
Labels
enhancementNew feature or requestNew feature or request
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
Edward-Knight commentedon Dec 11, 2024
Second this. I'm currently working on manually writing a CycloneDX SBOM for a specific build of PBS we use downstream, following which I'll be working on some automated tooling. I'll share both as and when I can to hopefully get the ball rolling on upstreaming this 👍
zanieb commentedon Dec 11, 2024
thank you!
Edward-Knight commentedon Dec 17, 2024
I've attached an SBOM I've made for one of the past builds of PBS. It's the first SBOM I've written and I don't have particular expertise in this area, but I believe it is a good starting point.
Notable exclusions
The SBOM is mostly "complete", with a few exceptions I've outlined below. Of course one could always include more information and fill every possible field, but that way lies madness.
/dependencies
)/metadata/component/{licenses,copyright}
or for dependencies under/components[]/{licenses,copyright}
)/formulation
section would be most appropriate, but also it seems like this is excluded from most SBOMs, instead left to a different document (e.g. an MBOM)metadata/tools
andmetadata/properties
for this (incorrectly){"name": "MacOS Compiler", "version": "clang (clang/LLVM from Xcode 15.2)"}
{"name": "OS version", "value": "Darwin 23.6.0"}
Guide to the format
Since a large blob of JSON can be intimidating at first, I'll quickly go over the general structure. At the top level there is some generic boilerplate to do with the SBOM itself:
There is a "metadata component" and another list of "components". These all follow the same format. The "metadata component" is the one the SBOM is about, i.e. a particular build of PBS:
This SBOM is for a specific release tarball and the
mime-type
andhashes
reflect this. There are some self-explanatory fields likename
,version
,description
, andauthor
(which I imagine will change tomanufacturer
and point to "Astral Software Inc." for future releases). Thebom-ref
,purl
, and "distribution" type external reference all have duplicate information pointing to the exact build (aspython-build-standalone@20231002
on its own does not uniquely define a release).The array of components is then a list of the dependencies. I wrote the CPython one by hand, and have included a short bit of Python code that I used to generate the rest from the
downloads.py
list. As an example, here is the one for CPython (which has more detail than the other dependencies):As noted above, all the "components" have a similar structure, the only difference is that I've set the
manufacturer
field instead of theauthor
field.Attachments and useful links
python3 downloads_to_components.py $(cat deps.txt) > out.json
wheredeps.txt
included information fromcpython-unix/targets.yml
and there exists a local copy ofpythonbuild/downloads.py
purl
fields): https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rstEdward-Knight commentedon Dec 17, 2024
I've been looking at software that consumes SBOMs (in this case Dependency-Track), and it seems like the generic purls (package URLs) aren't used for matching against vulnerability databases. My testing shows that CPE (Common Platform Enumeration) IDs do work though - using a CPE of
cpe:2.3:a:openssl:openssl:3.0.11:*:*:*:*:*:*:*
for openssl does correctly link to some vulnerabilities. We can't automatically construct these as with generic purls, but adding them to metadata for a component and using them where available is probably a good ideaEdward-Knight commentedon Jan 3, 2025
Charlie et. al, a few implementation questions:
From the looks of it we could maybe do it:
cpython-unix/build.py::build_cpython()
afterPYTHON.json
is createdcompress_python_archive()
is called incpython-unix/build-main.py::main()
pythonbuild validate-distribution
(or a new CLI action forpythonbuild
)It looks like we need information available to the Python script at build time (the versions of dependencies). For us to make the SBOM specifically about the tarball we'd need to at least update it with the tarball hash after it has been created.
I presume this would form an extra build artefact alongside each tarball
Edward-Knight commentedon Jan 30, 2025
@zanieb I've got some time coming up to work on this, would this be a contribution you would accept or should I implement this outside the project?
zanieb commentedon Jan 30, 2025
I'm interested! Sorry for the lack of reply — got lost / I don't have great answers.
After the
PYTHON.json
is created seems reasonable. We need the tarball fromcompress_python_archive
though, right? Probably makes less sense to add this during validation. A new action forpythonbuild
could be sensible too. Do we need build-time information?Note we create derived artifacts at release time. This was relevant, e.g., for #343, and presume would be relevant here if you need the archive hash? Are you going to create a SBOM for every release artifact? I worry about doubling the number of release artifacts — I'm not sure when GitHub will start enforcing a limit but we're already at more than 1000.
Also vaguely relevant, we were considering #284 to improve consumption of the download metadata outside of Python.
Edward-Knight commentedon Feb 5, 2025
Thanks!
I'll keep in mind the "derived artifacts" 👍
I was planning on making one SBOM per release artifact tarball... which would increase the number of artifacts by another 50%.... Would you be open to getting rid of the
.sha256
files, or perhaps having one big checksum file? I can't think of a better place to put SBOMs than adding them to the release artifactsEdward-Knight commentedon Feb 7, 2025
RE: number of release assets, GitHub do specifically say there is "no limit" to the total size (although don't talk about number of assets directly). Of course I'm sure there is some limit, but we could "call their bluff" so to speak
zanieb commentedon Feb 7, 2025
I guess let's go for it and we can deal with it if there are problems.
We could collapse the checksums into a single file if we have to, yeah. They seems slightly easier to use as separate files though,