Open
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [ y] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
- [ y] I am reporting the issue to the correct repository. (Model Garden official or research directory)
- [ y] I checked to make sure that this issue has not been filed already.
1. The entire URL of the file you are using
2. Describe the bug
The 'predict' method of SSDMetaArch has a memory leak. It's leaking 100-200MB for each batch of 32 320x320 images during my training loop.
3. Steps to reproduce
The leak can be observed in this official tutorial:
To see it happening, just add memory-profiler and decorate train_step_fn
with @profile
.
The leak happens regardless of whether execution occurs with a GradientTape
.
4. Expected behavior
RAM usage should not increase between calls to predict
.
5. Additional context
Here's a sample memory-profiler
dump from my own version of the notebook (not identical to the one linked to above):
Line # Mem usage Increment Occurences Line Contents
============================================================
142 1549.4 MiB 1549.4 MiB 1 @profile
143 def train_step_fn(image_tensors,
144 groundtruth_boxes_list,
145 groundtruth_classes_list):
146 """A single training iteration.
147
148 Args:
149 image_tensors: A list of [1, height, width, 3] Tensor of type tf.float32.
150 Note that the height and width can vary across images, as they are
151 reshaped within this function to be 320x320.
152 groundtruth_boxes_list: A list of Tensors of shape [N_i, 4] with type
153 tf.float32 representing groundtruth boxes for each image in the batch.
154 groundtruth_classes_list: A list of Tensors of shape [N_i, num_classes]
155 with type tf.float32 representing groundtruth boxes for each image in
156 the batch.
157
158 Returns:
159 A scalar tensor representing the total loss for the input batch.
160 """
161 1549.4 MiB 0.0 MiB 1 shapes = tf.constant(len(image_tensors) * [[320, 320, 3]], dtype=tf.int32)
162 1549.4 MiB 0.0 MiB 1 model.provide_groundtruth(
163 1549.4 MiB 0.0 MiB 1 groundtruth_boxes_list=groundtruth_boxes_list,
164 1549.4 MiB 0.0 MiB 1 groundtruth_classes_list=groundtruth_classes_list)
165 # The images each have a pointless batch dimension of 1, so do a reshape
166 # to remove this from the result of concatenation
167 1549.4 MiB 0.0 MiB 1 concatted = tf.reshape(tf.concat(image_tensors, axis=0), (len(image_tensors), 320, 320, 3))
168 1737.8 MiB 188.4 MiB 1 prediction_dict = model.predict(concatted, shapes)
169 1737.8 MiB 0.0 MiB 1 losses_dict = model.loss(prediction_dict, shapes)
170 1737.8 MiB 0.0 MiB 1 total_loss = losses_dict['Loss/localization_loss'] + losses_dict['Loss/classification_loss']
180 1737.8 MiB 0.0 MiB 1 return total_loss
6. System information
Dockerfile tensorflow/tensorflow:2.4.1-gpu