Benchmark activations using different implementations.

See https://github.com/tensorflow/addons/pull/1137#discussion_r384586158

We now have multiple implementations for activations on CPU and GPU, It might be worth benchmarking them because it's not obvious which ones are faster. On CPU, for big tensors, pure python in eager mode seems to be faster than custom C++ ops. 

See https://colab.research.google.com/drive/1LTx3vMpA1fLCESKl-_WrLIp0Fq1zYZL0
for benchmarking on cpu (when running the notebook, make sure the gpu is not available in colab).


For GPU and CPU, 4 implementations should be tested:
* Custom kernel (C++ or CUDA)
* Pure python in eager mode
* Pure python with tf.function
* Pure python with XLA

We should also test for big and small tensors (the best would be to have plots, with 4 curves, x axis being the number of elements in the input tensor, y axis being the speed (number of elements processed by sec). Don't fortget to use `.numpy()` to force the execution (ops might get executed lazily).

To obtain the result of a %timeit, use the `-o` flag:

```
timeit_object_with_results = %timeit -o my_func(my_tensor)
```


If everything could be delivered in a pretty notebook on colab with the link in this issue (no pull request), it would be surper duper awesome 😃 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark activations using different implementations. #1156

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark activations using different implementations. #1156

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions