Skip to content

idx_to_str appears to significantly slow down scan numba code by applying np.array to scan_inner_func inputs #233

Closed
@jessegrabowski

Description

@jessegrabowski
Member

Description

@ricardoV94 was doing some benchmarking of numba code generated by scan, and we discovered that wrapping output outer_in values with np.array significantly slows down the scans. Some speed tests are here. Cliff's notes:

  • The numba code scan generates is 100x slower than a vectorized numba function
  • This speed difference is entirely due to wrapping outer_in with np.array in the idx_to_str function.

I'm going to test removing the np.array part entirely and run the test suite to see what happens, but in the meantime I'm opening this issue to ask if anyone knows why this exists.

Activity

ricardoV94

ricardoV94 commented on Mar 4, 2023

@ricardoV94
Member

I suspect it exists for tensor scalar inputs which our transpilation of numba has a tendency to "downcast" to floats here and there.

This is evidenced by the fact that helper function has an "allow_scalar" which is set to True just for nit-sot (outputs that don't feed back to the internal function).

This safeguard should not be needed for tensor inputs with ndim > 0 (which is know at compile time) and which was the case in the gist

This safeguard applies exactly to our case where indexing the vector sequence would return a scalar instead of a scalar array

jessegrabowski

jessegrabowski commented on Mar 4, 2023

@jessegrabowski
MemberAuthor

Right. In the case the speed test considers, the input is 1d. We also tested the 2d case, but there's no speedup to be had here.

As you said to me before, the danger in passing a scalar to the inner function would be if there is some kind of indexing operation that would error out if the input were just a floatX. But if scalars were handled somehow in optimization, it might end up being safe to remove this np.array?

ricardoV94

ricardoV94 commented on Mar 4, 2023

@ricardoV94
Member

Yup, I don't know exactly if indexing is allowed but there must be some Ops that work with scalar arrays but not floats.

We should invest in #107 so that we can properly support scalar graphs, also in Scans

ricardoV94

ricardoV94 commented on Mar 12, 2024

@ricardoV94
Member

Closing this for now. We avoided useless creation of arrays at every step. The scalar optimizations are more general than the Numba backend and should come for free in a rewritten Scan Op like #191

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @ricardoV94@jessegrabowski

        Issue actions

          `idx_to_str` appears to significantly slow down `scan` numba code by applying `np.array` to `scan_inner_func` inputs · Issue #233 · pymc-devs/pytensor