monai.bundle run tests OOM

**Describe the bug**
```
2022-03-14T15:49:57.6853812Z 2022-03-14 15:49:57,684 - INFO - 
2022-03-14T15:49:57.6854284Z --- input summary of monai.bundle.scripts.run ---
2022-03-14T15:49:57.6854826Z 2022-03-14 15:49:57,684 - INFO - > config_file: '/__w/MONAI/MONAI/tests/testing_data/inference.json'
2022-03-14T15:49:57.6855287Z 2022-03-14 15:49:57,685 - INFO - > runner_id: 'evaluator'
2022-03-14T15:49:57.6856056Z 2022-03-14 15:49:57,685 - INFO - > meta_file: '/tmp/tmp5l91p7s9/meta.yaml'
2022-03-14T15:49:57.6856589Z 2022-03-14 15:49:57,685 - INFO - > postprocessing#transforms#2#output_postfix: 'seg'
2022-03-14T15:49:57.6857092Z 2022-03-14 15:49:57,685 - INFO - > network: '%/tmp/tmp5l91p7s9/override1.JSON#move_net'
2022-03-14T15:49:57.6857614Z 2022-03-14 15:49:57,685 - INFO - > dataset#_target_: '%/tmp/tmp5l91p7s9/jsons/override2.JSON'
2022-03-14T15:49:57.6858016Z 2022-03-14 15:49:57,685 - INFO - ---
2022-03-14T15:49:57.6858694Z 
2022-03-14T15:49:57.6858700Z 
2022-03-14T15:50:02.1080152Z .Current run is terminating due to exception: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 15.75 GiB total capacity; 235.40 MiB already allocated; 9.88 MiB free; 296.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
2022-03-14T15:50:02.1629767Z Engine run is terminating due to exception: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 15.75 GiB total capacity; 235.40 MiB already allocated; 9.88 MiB free; 296.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
2022-03-14T15:50:02.1707910Z Traceback (most recent call last):
2022-03-14T15:50:02.1708299Z   File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
2022-03-14T15:50:02.1708789Z     "__main__", mod_spec)
2022-03-14T15:50:02.1709136Z   File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
2022-03-14T15:50:02.1709412Z     exec(code, run_globals)
2022-03-14T15:50:02.1709726Z   File "/__w/MONAI/MONAI/monai/bundle/__main__.py", line 19, in <module>
2022-03-14T15:50:02.1710002Z     fire.Fire()
2022-03-14T15:50:02.1710710Z   File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 141, in Fire
2022-03-14T15:50:02.1711154Z     component_trace = _Fire(component, args, parsed_flag_args, context, name)
2022-03-14T15:50:02.1712081Z   File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 471, in _Fire
2022-03-14T15:50:02.1712782Z     target=component.__name__)
2022-03-14T15:50:02.1713640Z   File "/usr/local/lib/python3.7/dist-packages/fire/core.py", line 681, in _CallAndUpdateTrace
2022-03-14T15:50:02.1713990Z     component = fn(*varargs, **kwargs)
2022-03-14T15:50:02.1714319Z   File "/__w/MONAI/MONAI/monai/bundle/scripts.py", line 122, in run
2022-03-14T15:50:02.1714595Z     workflow.run()
2022-03-14T15:50:02.1714916Z   File "/__w/MONAI/MONAI/monai/engines/evaluator.py", line 137, in run
2022-03-14T15:50:02.1715209Z     super().run()
2022-03-14T15:50:02.1715488Z   File "/__w/MONAI/MONAI/monai/engines/workflow.py", line 282, in run
2022-03-14T15:50:02.1715870Z     super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
2022-03-14T15:50:02.1716861Z   File "/usr/local/lib/python3.7/dist-packages/ignite/engine/engine.py", line 704, in run
2022-03-14T15:50:02.1717373Z     return self._internal_run()
2022-03-14T15:50:02.1717814Z   File "/usr/local/lib/python3.7/dist-packages/ignite/engine/engine.py", line 783, in _internal_run
2022-03-14T15:50:02.1718151Z     self._handle_exception(e)
2022-03-14T15:50:02.1718614Z   File "/usr/local/lib/python3.7/dist-packages/ignite/engine/engine.py", line 466, in _handle_exception
2022-03-14T15:50:02.1718911Z     raise e
2022-03-14T15:50:02.1719522Z   File "/usr/local/lib/python3.7/dist-packages/ignite/engine/engine.py", line 753, in _internal_run
2022-03-14T15:50:02.1720121Z     time_taken = self._run_once_on_dataset()
2022-03-14T15:50:02.1720787Z   File "/usr/local/lib/python3.7/dist-packages/ignite/engine/engine.py", line 854, in _run_once_on_dataset
2022-03-14T15:50:02.1721122Z     self._handle_exception(e)
2022-03-14T15:50:02.1721543Z   File "/usr/local/lib/python3.7/dist-packages/ignite/engine/engine.py", line 466, in _handle_exception
2022-03-14T15:50:02.1721975Z     raise e
2022-03-14T15:50:02.1722390Z   File "/usr/local/lib/python3.7/dist-packages/ignite/engine/engine.py", line 840, in _run_once_on_dataset
2022-03-14T15:50:02.1722788Z     self.state.output = self._process_function(self, self.state.batch)
2022-03-14T15:50:02.1723157Z   File "/__w/MONAI/MONAI/monai/engines/evaluator.py", line 265, in _iteration
2022-03-14T15:50:02.1723698Z     engine.state.output[Keys.PRED] = self.inferer(inputs, self.network, *args, **kwargs)  # type: ignore
2022-03-14T15:50:02.1724081Z   File "/__w/MONAI/MONAI/monai/inferers/inferer.py", line 182, in __call__
2022-03-14T15:50:02.1724320Z     **kwargs,
2022-03-14T15:50:02.1724612Z   File "/__w/MONAI/MONAI/monai/inferers/utils.py", line 134, in sliding_window_inference
2022-03-14T15:50:02.1724993Z     seg_prob = predictor(window_data, *args, **kwargs).to(device)  # batched patch segmentation
2022-03-14T15:50:02.1725484Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1725821Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1726098Z   File "/__w/MONAI/MONAI/monai/networks/nets/unet.py", line 281, in forward
2022-03-14T15:50:02.1726368Z     x = self.model(x)
2022-03-14T15:50:02.1726796Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1727102Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1727538Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
2022-03-14T15:50:02.1727827Z     input = module(input)
2022-03-14T15:50:02.1728252Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1728578Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1728874Z   File "/__w/MONAI/MONAI/monai/networks/layers/simplelayers.py", line 128, in forward
2022-03-14T15:50:02.1729161Z     y = self.submodule(x)
2022-03-14T15:50:02.1729566Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1730260Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1730752Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
2022-03-14T15:50:02.1731106Z     input = module(input)
2022-03-14T15:50:02.1731581Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1731929Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1732421Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
2022-03-14T15:50:02.1732755Z     input = module(input)
2022-03-14T15:50:02.1733545Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1733875Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1734178Z   File "/__w/MONAI/MONAI/monai/networks/blocks/convolutions.py", line 325, in forward
2022-03-14T15:50:02.1734543Z     cx: torch.Tensor = self.conv(x)  # apply x to sequence of operations
2022-03-14T15:50:02.1735014Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1735353Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1735797Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
2022-03-14T15:50:02.1736097Z     input = module(input)
2022-03-14T15:50:02.1736531Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1737093Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1737548Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
2022-03-14T15:50:02.1737857Z     input = module(input)
2022-03-14T15:50:02.1738254Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1738648Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1739068Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in forward
2022-03-14T15:50:02.1739373Z     input = module(input)
2022-03-14T15:50:02.1739790Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
2022-03-14T15:50:02.1740094Z     return forward_call(*input, **kwargs)
2022-03-14T15:50:02.1740531Z   File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/activation.py", line 1111, in forward
2022-03-14T15:50:02.1740842Z     return F.prelu(input, self.weight)
2022-03-14T15:50:02.1741437Z RuntimeError: CUDA out of memory. Tried to allocate 28.00 MiB (GPU 0; 15.75 GiB total capacity; 235.40 MiB already allocated; 9.88 MiB free; 296.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

monai.bundle run tests OOM #3934

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

monai.bundle run tests OOM #3934

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions