Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
46f1b29
Add `Dicom` datatype and sniffer to datatypes_conf.xml.sample
kostrykin Dec 2, 2025
31f13c2
Add dicom format, example file
hexylena Sep 26, 2024
b588776
Implement `Dicom.set_meta`
kostrykin Dec 2, 2025
22319a1
Fix formatting
kostrykin Dec 2, 2025
09b9d08
Extend and fix `Dicom.set_meta`
kostrykin Dec 2, 2025
2d31ca7
Add test for `Dicom` metadata
kostrykin Dec 2, 2025
79f2544
Employ `pydicom` to identify DICOM files
kostrykin Dec 2, 2025
d8e1b2d
Fix `Dicom.set_meta`
kostrykin Dec 2, 2025
f08ae16
Add pydicom dependency
kostrykin Dec 2, 2025
24b2355
Fix images.py
kostrykin Dec 3, 2025
a156850
Make `import pydicom` optional
kostrykin Dec 3, 2025
8c812e1
Add `mimetype` and set `display_in_upload` to true for `Dicom`
kostrykin Dec 3, 2025
bcf27d7
Add `description_url` for `Dicom`
kostrykin Dec 3, 2025
1842d52
Fix `Dicom` and tests
kostrykin Dec 3, 2025
4c76831
Employ `build_sniff_from_prefix` and add test
kostrykin Dec 3, 2025
6e0d82a
Restrict pydicom to Python >=3.10 in pyproject.toml
kostrykin Dec 3, 2025
fadcf16
Update deps with `lib/galaxy/dependencies/update.sh -p pydicom`
kostrykin Dec 3, 2025
2f95de3
Fix linting
kostrykin Dec 3, 2025
259af30
Fix reference to TotalPixelMatrix in images.py
kostrykin Dec 3, 2025
4483514
Revert 3b96806556737039d7998c0271d2d534b81ad603 for unrelated packages
kostrykin Dec 3, 2025
7f297f0
Fix linting
kostrykin Dec 3, 2025
0b91971
Fix linting
kostrykin Dec 3, 2025
32e21b3
Fix linting
kostrykin Dec 3, 2025
017babf
Fix linting (now for real)
kostrykin Dec 3, 2025
09d6b2f
Add `pydicom` as a package dependency
kostrykin Dec 3, 2025
c0282e2
Fix bug in `Dicom.set_meta` that made integration tests fail
kostrykin Dec 4, 2025
e020ad9
Employ more diverse DICOM test data
kostrykin Dec 4, 2025
57970fc
Migrate tests to new test data and fix `Dicom.set_meta`
kostrykin Dec 4, 2025
b143d8f
Fix tests and `Dicom.set_meta` implementation for tiled data
kostrykin Dec 4, 2025
e917233
Add `is_tiled` metadata element for DICOM
kostrykin Dec 4, 2025
75df743
Refactor test/unit/data/datatypes/test_images.py
kostrykin Dec 4, 2025
21a1e08
Refactor test/unit/data/datatypes/test_images.py
kostrykin Dec 4, 2025
96db3d0
Fix type hints
kostrykin Dec 4, 2025
af80ed8
Fix linting
kostrykin Dec 4, 2025
b9f7d1f
Adopt formatting for black to pass
kostrykin Dec 4, 2025
2ef61f4
Revert import order to what isort likes but flake8 doesn't
kostrykin Dec 4, 2025
aced46f
Fix docstring example
kostrykin Dec 4, 2025
96a647c
Reproduce bug (integration test failure) as unit test
kostrykin Dec 4, 2025
2c50f81
Employ `build_sniff_from_prefix` for `Tiff` to solve ambiguity with `…
kostrykin Dec 4, 2025
569174b
Bring back `channels` metadata element (deleted by accident)
kostrykin Dec 4, 2025
9042ad8
Fix formatting for black
kostrykin Dec 4, 2025
3d60cd6
Clean up `OMETiff.sniff` and add tests
kostrykin Dec 4, 2025
11cf6fd
Fix formatting for black
kostrykin Dec 4, 2025
486e55e
Ignore `sniff_prefix` in `OMETiff` derived from `Tiff`
kostrykin Dec 4, 2025
c0f025b
Revert aa1731941eaba07b1028b266ca11075d280a4879 for sniff.py, reimple…
kostrykin Dec 4, 2025
28e3b1c
Update lib/galaxy/config/sample/datatypes_conf.xml.sample
kostrykin Jan 13, 2026
ca43aed
Fix `mimetype`
kostrykin Jan 19, 2026
26bece8
Clean up lib/galaxy/dependencies/pinned-requirements.txt
kostrykin Jan 19, 2026
1d7a944
Merge branch 'dev' into feat/dicom
kostrykin Jan 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lib/galaxy/config/sample/datatypes_conf.xml.sample
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,7 @@
<datatype extension="ome.tiff" type="galaxy.datatypes.images:OMETiff" display_in_upload="true">
<display file="image/avivator.xml"/>
</datatype>
<datatype extension="dcm" type="galaxy.datatypes.images:Dicom" mimetype="application/dicom" subclass="true" display_in_upload="true" description_url="https://formats.kaitai.io/dicom"/>
<datatype extension="vms" type="galaxy.datatypes.images:Hamamatsu" mimetype="image/hamamatsu"/>
<datatype extension="vmu" type="galaxy.datatypes.images:Hamamatsu" subclass="true" display_in_upload="false"/>
<datatype extension="ndpi" type="galaxy.datatypes.images:Hamamatsu" subclass="true" display_in_upload="false"/>
Expand Down Expand Up @@ -1451,6 +1452,7 @@
<sniffer type="galaxy.datatypes.images:Png"/>
<sniffer type="galaxy.datatypes.images:OMETiff"/>
<sniffer type="galaxy.datatypes.images:Tiff"/>
<sniffer type="galaxy.datatypes.images:Dicom"/>
<sniffer type="galaxy.datatypes.images:Bmp"/>
<sniffer type="galaxy.datatypes.images:Gif"/>
<sniffer type="galaxy.datatypes.images:Im"/>
Expand Down
149 changes: 140 additions & 9 deletions lib/galaxy/datatypes/images.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""

import base64
import io
import json
import logging
import math
Expand All @@ -18,6 +19,7 @@
import mrcfile
import numpy as np
import png
import pydicom
import tifffile

try:
Expand Down Expand Up @@ -225,10 +227,12 @@ def set_meta(
dataset.metadata.num_unique_values = len(unique_values)


@build_sniff_from_prefix
class Tiff(Image):
edam_format = "format_3591"
file_ext = "tiff"
display_behavior = "download" # TIFF files trigger browser downloads

MetadataElement(
name="offsets",
desc="Offsets File",
Expand All @@ -239,6 +243,27 @@ class Tiff(Image):
optional=True,
)

def sniff_prefix(self, file_prefix: FilePrefix) -> bool:
"""
Determine if the file is in TIFF format by checking the file header.

For a successful check, the first 4 bytes must be the TIFF magic number. See [1] for a list of magic numbers.

Manual checking of the file header, as opposed to trying to read the file with tifffile, is required due to an
ambiguity with DICOM files. This is because the DICOM standard allows *any content* for the first 128 bytes of
the file, followed by the DICOM prefix (see §7.1 in [2] for details).

[1] https://gist.github.com/leommoore/f9e57ba2aa4bf197ebc5
[2] https://dicom.nema.org/medical/dicom/current/output/html/part10.html
"""
return file_prefix.contents_header_bytes[:4] in (
b"\x4d\x4d\x00\x2a", # TIFF format (Motorola - big endian)
b"\x49\x49\x2a\x00", # TIFF format (Intel - little endian)
) and (
len(file_prefix.contents_header_bytes) < 132 # file is too short to be a DICOM
or file_prefix.contents_header_bytes[128:132] != b"DICM" # file does not contain the DICOM prefix
)

def set_meta(
self, dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: Optional[str] = None, **kwd
) -> None:
Expand Down Expand Up @@ -385,19 +410,14 @@ def _read_segments(page: Union[tifffile.TiffPage, tifffile.TiffFrame]) -> Iterat

yield segment

def sniff(self, filename: str) -> bool:
with tifffile.TiffFile(filename):
return True


class OMETiff(Tiff):
file_ext = "ome.tiff"

def sniff(self, filename: str) -> bool:
with tifffile.TiffFile(filename) as tif:
if tif.is_ome:
return True
return False
def sniff_prefix(self, file_prefix: FilePrefix) -> bool:
buf = io.BytesIO(file_prefix.contents_header_bytes)
with tifffile.TiffFile(buf) as tif:
return tif.is_ome
Comment on lines +418 to +420
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling this won't work for larger files where the image is larger than the contents_header_bytes .

Maybe you can use the @disable_parent_class_sniffing decorator from lib/galaxy/datatypes/sniff.py and restore the previous sniff method?

Copy link
Copy Markdown
Contributor Author

@kostrykin kostrykin Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it (see kostrykin#4), but when I put the @disable_parent_class_sniffing decorator on the OMETiff class, the sniff method does not get called. The decorator seems to disable sniffing.

The test/unit/data/datatypes/test_images.py::test_ome_tiff_sniff fails with

>       assert OMETiff().sniff(fname)
E       AssertionError: assert False
E        +  where False = <lambda>('lib/galaxy/datatypes/test/1.ome.tiff')
E        +    where <lambda> = <galaxy.datatypes.images.OMETiff object at 0x7f99946c3190>.sniff
E        +      where <galaxy.datatypes.images.OMETiff object at 0x7f99946c3190> = OMETiff()

Despite that OMETiff.sniff is implemented as

    def sniff(self, filename: str) -> bool:
        raise ValueError('OMETiff.sniff called')  # <-- !!! this error is *not* reported !!!
        with tifffile.TiffFile(filename) as tif:
            return tif.is_ome

thus, OMETiff.sniff is not called when the @disable_parent_class_sniffing decorator is in place.

Here is the error from running the test: https://github.com/kostrykin/galaxy/actions/runs/19942920690/job/57185121231?pr=4#step:8:11238

That said, I think the current solution should work, because tifffile.TiffFile only reads the headers/metadata of the file. The pixel content only is read when the corresponding methods are used (they are not used here).

On the other hand, since we do not know how large metadata can be, or how it is organized, if there is a possibility to get this working without sniff_prefix, I'd go for that.



class OMEZarr(data.ZarrDirectory):
Expand Down Expand Up @@ -519,6 +539,117 @@ def sniff(self, filename: str) -> bool:
return fh.read(4) == b"%PDF"


@build_sniff_from_prefix
class Dicom(Image):
"""
DICOM medical imaging format (.dcm)

>>> from galaxy.datatypes.sniff import get_test_fname
>>> fname = get_test_fname('ct_image.dcm')
>>> Dicom().sniff(fname)
True
"""

MetadataElement(
name="is_tiled",
desc="Is this a WSI DICOM?",
readonly=True,
visible=True,
optional=True,
)

edam_format = "format_3548"
file_ext = "dcm"

def sniff_prefix(self, file_prefix: FilePrefix) -> bool:
"""
Determine if the file is in DICOM format according to §7.1 in [1].

[1] https://dicom.nema.org/medical/dicom/current/output/html/part10.html
"""
return len(file_prefix.contents_header_bytes) >= 132 and file_prefix.contents_header_bytes[128:132] == b"DICM"

def get_mime(self) -> str:
"""
Returns the mime type of the datatype.
"""
return "application/dicom"

def set_meta(
self, dataset: DatasetProtocol, overwrite: bool = True, metadata_tmp_files_dir: Optional[str] = None, **kwd
) -> None:
"""
Populate the metadata of the DICOM file using the pydicom library.

The following metadata fields are populated, if possible:
- `width`
- `height`
- `channels`
- `dtype`
- `num_unique_values` in some cases
- `is_tiled`

Currently, `frames` and `depth` are not populated. This is because "frames" in DICOM are a generic entity,
that can be used for different purposes, including slices in 3-D images, frames in temporal sequences, and
tiles of a mosaic or pyramid (WSI DICOM). Distinguishing these cases is not straight-forward (and, as a
consequence, neither is determining the `axes` of the image). This can be implemented in the future.
"""
try:
dcm = pydicom.dcmread(dataset.get_file_name(), stop_before_pixels=True)
except pydicom.errors.InvalidDicomError:
return # Ignore errors if metadata cannot be read

# Determine the number of channels (0 if no channel info is present)
dataset.metadata.channels = dcm.get("SamplesPerPixel", 0)

# Determine if the DICOM file is tiled (likely WSI DICOM)
dataset.metadata.is_tiled = hasattr(dcm, "TotalPixelMatrixColumns") and hasattr(dcm, "TotalPixelMatrixRows")

# Determine the width and height of the dataset. If the DICOM file is not tiled, the width and height
# directly. For tiled DICOM, these values correspond to the size of the tiles.
if dataset.metadata.is_tiled:
dataset.metadata.width = dcm.TotalPixelMatrixColumns
dataset.metadata.height = dcm.TotalPixelMatrixRows
else:
dataset.metadata.width = dcm.get("Columns")
dataset.metadata.height = dcm.get("Rows")

# Try to infer the `dtype` from metadata
if dcm.BitsAllocated == 1:
dataset.metadata.dtype = "bool" # 1bit
else:
dtype_lut = [
["uint8", "int8"],
["uint16", "int16"],
["uint32", "int32"],
]
dtype_lut_pos = (
round(math.log2(dcm.BitsAllocated) - 3), # 8bit -> 0, 16bit -> 1, 32bit -> 2
dcm.PixelRepresentation,
)
if 0 <= dtype_lut_pos[0] < len(dtype_lut):
dataset.metadata.dtype = dtype_lut[dtype_lut_pos[0]][dtype_lut_pos[1]]
else:
dataset.metadata.dtype = None # unknown `dtype`

# Try to infer `num_unique_values` from metadata
try:
if dcm.SOPClassUID == "1.2.840.10008.5.1.4.1.1.66.4": # https://www.dicomlibrary.com/dicom/sop

# The DICOM file contains segmentation, count +1 for the image background
dataset.metadata.num_unique_values = 1 + len(dcm.SegmentSequence)

else:

# Otherwise, `num_unique_values` is not available from metadata
dataset.metadata.num_unique_values = None

except AttributeError:

# Ignore errors if metadata cannot be read
dataset.metadata.num_unique_values = None


@build_sniff_from_prefix
class Tck(Binary):
"""
Expand Down
1 change: 1 addition & 0 deletions lib/galaxy/datatypes/test/ct_image.dcm
1 change: 1 addition & 0 deletions lib/galaxy/datatypes/test/seg_image_ct_binary.dcm
1 change: 1 addition & 0 deletions lib/galaxy/datatypes/test/sm_image.dcm
2 changes: 2 additions & 0 deletions lib/galaxy/dependencies/pinned-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,8 @@ pydantic-graph==1.44.0
pydantic-settings==2.12.0
pydantic-tes==0.2.0
pydocket==0.16.6
pydicom==2.4.4 ; python_full_version < '3.10'
pydicom==3.0.1 ; python_full_version >= '3.10'
pydot==4.0.1
pyeventsystem==0.1.0
pyfaidx==0.9.0.3
Expand Down
1 change: 1 addition & 0 deletions packages/data/setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ install_requires =
parsley
pycryptodome
pydantic[email]>=2.7.4
pydicom
pylibmagic
pypng
python-magic
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ dependencies = [
"pycryptodome",
"pydantic[email]>=2.7.4", # https://github.com/pydantic/pydantic/pull/9639
"pydantic-ai>=0.1.16", # AI agent framework
"pydicom; python_version >= '3.10'",
"PyJWT",
"pykwalify",
"pylibmagic",
Expand Down
7 changes: 7 additions & 0 deletions test-data/highdicom/LICENSE
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move the dataset into https://github.com/galaxyproject/galaxy-test-data then we can keep the license file there and not pollute the repo here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, test data from that repo is not accessible within unit tests.

Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Copyright 2020 MGH Computational Pathology

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Binary file added test-data/highdicom/ct_image.dcm
Binary file not shown.
Binary file added test-data/highdicom/seg_image_ct_binary.dcm
Binary file not shown.
Binary file added test-data/highdicom/sm_image.dcm
Binary file not shown.
Loading
Loading