-
Notifications
You must be signed in to change notification settings - Fork 135
Speedup Numba Scans with vector inputs #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9381b5b
to
5e2b5b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, provided I understand what's going on correctly.
temp_storage = f"{storage_name}_temp_scalar_{tap_offset}" | ||
storage_dtype = outer_in_var.type.numpy_dtype.name | ||
temp_scalar_storage_alloc_stmts.append( | ||
f"{temp_storage} = np.empty((), dtype=np.{storage_dtype})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question so I'm clear. is_vector
refers to the values passed by outputs_info
. This will be a vector if there are multiple taps requested. You're making this storage scalar as a place to break up this vector and store the individual values as they are fed into the inner function. Doing this prevents the need to call np.asarray
on the output, because it's already an array, because these storage values are 0d arrays. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The is_vector
refers to any vector input (sequence or recurrent), which must be broken into scalar arrays in each iteration. This happens regardless of the number of taps (or you could say there's always at least one tap of -1
).
5e2b5b5
to
70ac892
Compare
Indexing vector inputs to create taps during scan, yields numeric variables which must be wrapped again into scalar arrays before passing into the inernal function. This commit pre-allocates such arrays and reuses them during looping.
70ac892
to
36b05f7
Compare
Related to #233
In talks with @jessegrabowski and @aseyboldt we found out that there is a large overhead of
asarray
when indexing vector inputs to create taps, as these become non-array numerical variables that must be wrapped again into a scalar tensor (at least until we allow proper scalar inputs in Scan).The new benchmark test runs 2.5x faster on my machine compared to main.
The scan function used to look like this
And now looks like this