[Potential NAN bug] Loss may become NAN during training


Hello~

Thank you very much for sharing the code!

I try to use my own data set ( with the same shape as mnist) in  code. After some iterations, it is found that the training loss becomes NAN. After carefully checking the code, I found that the following code may trigger NAN in loss:

In `TensorFlow-Examples/examples/2_BasicModels/logistic_regression.py:  `

```python
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
```

If  pred contains 0 (output of softmax ), the result of `tf.log(pred)` is `inf` because `log(0)` is illegal . And this may cause the result of loss to become NAN.

It could be fixed by making the following changes:

```python
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred + 1e-10), reduction_indices=1))
```

or

```python
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred,1e-10,1.0)), reduction_indices=1))
```

Hope to hear from you ~

Thanks in advance!  : ) 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Potential NAN bug] Loss may become NAN during training #383

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

[Potential NAN bug] Loss may become NAN during training #383

Description

Activity

Justobe commented on Jan 13, 2021

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Participants

Issue actions