Skip to content

Generalize GridPatchDataset and PatchDataset. Potentially introduce Persistent and/or Cache IterableDatasets. #6904

Closed
@ibro45

Description

@ibro45

Is your feature request related to a problem? Please describe.

  • GridPatchDataset's data argument must be a sequence of already loaded data. However, it would be consistent with other datasets to allow loading and manipulating other types of data through transform. For example, you might want to provide a dict of image and label paths and load them through transforms using LoadImaged().
  • PatchDataset should follow the same. Moreover, it has a transform argument that is actually the same as patch_transform in GridPatchDataset, so it should follow that naming.
  • Finally, it would be useful to have Persistent or Cache versions of these (or just any IterableDataset) - is that something that you would want to have?

Describe the solution you'd like

  • GridPatchDataset currently has patch_transform, but should have two transforms: transform and patch_transform. The transform is to be used the same as in the IterableDataset and is applied prior to patching, while patch_transform is applied to the patch.
  • PatchDataset should follow the same.

Describe alternatives you've considered
None

Additional context
I started looking into these patch-based datasets because I want to perform validation on large images during my training. Specifically, images are sometimes too big to perform SlidingWindowInferer on GPU, and using cpu_thresh just results in a very long computation. I hope to use this as a replacement for it, and it should be way faster if Persistent/Cache mode is available too. It basically emulates sliding window inference - still allows you to split into patches with some overlap and pass them through the network to get the prediction, but with a caveat that the predicted patches are never merged and instead evaluated against the corresponding label patches individually.

I am happy to make a PR for the first two points. I am also interested in implementing the third point if you agree with it and if we discuss how we should approach it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions