Possibly wrong R1 penalty

Hi.
Why do you use `softplus` before computing gradient [here](https://github.com/rosinality/style-based-gan-pytorch/blob/master/train.py#L132)?
In the original implementation they don't do it. Look at [this line](https://github.com/NVlabs/stylegan/blob/master/training/loss.py#L163).