-
-
Notifications
You must be signed in to change notification settings - Fork 24
Add type stubs to scipy/signal/_peak_finding.pyi
#87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Namely: 1. `argrelmin` 2. `argrelmax` 3. `argrelextrema`
Namely: 1. `peaks_prominences` 2. `peaks_widths`
jorenham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
In general this looks like a solid improvement.
I left a couple of minor suggestions that could help make this even better.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| class _FindPeaksResultsDict(TypedDict): | ||
| peak_heights: NotRequired[npt.NDArray[np.float64]] | ||
| left_thresholds: NotRequired[npt.NDArray[np.float64]] | ||
| right_thresholds: NotRequired[npt.NDArray[np.float64]] | ||
| prominences: NotRequired[npt.NDArray[np.float64]] | ||
| left_bases: NotRequired[npt.NDArray[np.intp]] | ||
| right_bases: NotRequired[npt.NDArray[np.intp]] | ||
| width_heights: NotRequired[npt.NDArray[np.float64]] | ||
| left_ips: NotRequired[npt.NDArray[np.float64]] | ||
| right_ips: NotRequired[npt.NDArray[np.float64]] | ||
| plateau_sizes: NotRequired[npt.NDArray[np.intp]] | ||
| left_edges: NotRequired[npt.NDArray[np.intp]] | ||
| right_edges: NotRequired[npt.NDArray[np.intp]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if every key is NotRequired, then you could also use _FindPeaksResultsDict(TypedDict, total=False): ... instead. See https://docs.python.org/3/library/typing.html#typing.TypedDict for details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. Haven't used TypedDict before.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| right_edges: NotRequired[npt.NDArray[np.intp]] | ||
|
|
||
| def argrelmin( | ||
| data: npt.NDArray[np.generic], axis: int = 0, order: int = 1, mode: _Mode = "clip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually tend to put a trailing comma in cases like this, which causes ruff format to place each parameters on a new line, improving readability. But I'm also fine if you leave it like this :)
scipy-stubs/signal/_peak_finding.pyi
Outdated
| order: int = 1, | ||
| mode: _Mode = "clip", | ||
| ) -> tuple[npt.NDArray[np.intp], ...]: ... | ||
| def peak_prominences(x: npt.ArrayLike, peaks: npt.ArrayLike, wlen: int | None = None) -> _ProminencesResult: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that peaks must always be an array-like of integers. So even though this is perfectly fine as-is, you could also consider using numpy._typing._ArrayLikeInt_co for this (no need to worry about the API stability here; at the moment I'm the only active numpy maintainer that works on typing ;) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nice to know! I feel a bit weird using ArrayLike when the source assumes the dtype but it is nice to know that there are types like that in numpy although it seems to be behind a private module.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| def argrelextrema( | ||
| data: npt.NDArray[np.generic], | ||
| comparator: Callable[[npt.NDArray[np.generic], npt.NDArray[np.generic]], npt.NDArray[np.bool_]], | ||
| axis: int = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The axis argument accepts everything that implement __index__.
So for example, instances of this Index type are also allowed.
>>> class Index:
... def __init__(self, i: int, /) -> None:
... self._i = int(i) # upcast `bool` or other `int` subtypes
... def __index__(self, /) -> int:
... return self._iYou could use the typing.SupportsIndex or optype.CanIndex protocols for this. (the latter is optionally generic on the return type of __index__ which allows using it with e.g. Literal).
Note that there are some scipy functions that also require __lt__ or __add__ as well. But in any case, it's better to have overly-broad parameter types than overly-narrow ones.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| peaks: npt.ArrayLike, | ||
| rel_height: float = 0.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stubgen was wrong in this case: rel_height also accepts e.g. np.float16(0.5) and np.bool_(True). The scipy._typing.AnyReal type alias could be used in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument is one of the ones that gets passed straight into a cython function, here is the signature
def _peak_widths(const np.float64_t[::1] x not None,
const np.intp_t[::1] peaks not None,
np.float64_t rel_height,
const np.float64_t[::1] prominences not None,
const np.intp_t[::1] left_bases not None,
const np.intp_t[::1] right_bases not None):I'm not familiar with cython so I didn't know how it handles type conversion from python into cython. I think the cython docs say something like python objects are converted to their c types automatically for def functions, so I think rel_height can be any type that can get converted into np.float64_t which I think is just an alias for double. Not sure which types are allowed in that case.
The AnyReal type alias is a nice suggestion though.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| x: npt.ArrayLike, | ||
| height: float | npt.NDArray[np.float64] | tuple[float | None, float | None] | None = None, | ||
| threshold: float | npt.NDArray[np.float64] | tuple[float | None, float | None] | None = None, | ||
| distance: np.float64 | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that (at least) float is also allowed here at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as rel_height where it is passed straight into a cython function which I didn't know before did automatic type conversion. This means using AnyReal should be fine I think.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| min_length: Untyped | None = None, | ||
| vector: npt.ArrayLike, | ||
| widths: npt.ArrayLike, | ||
| wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this always have to return an array of float64 dtype, or can it also be e.g. float16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it as a placeholder but wavelet is the same wavelet that gets passed into _cwt
def _cwt(data, wavelet, widths, dtype=None, **kwargs):
# Determine output type
if dtype is None:
if np.asarray(wavelet(1, widths[0], **kwargs)).dtype.char in 'FDG':
dtype = np.complex128
else:
dtype = np.float64
output = np.empty((len(widths), len(data)), dtype=dtype)
for ind, width in enumerate(widths):
N = np.min([10 * width, len(data)])
wavelet_data = np.conj(wavelet(N, width, **kwargs)[::-1])
output[ind] = convolve(data, wavelet_data, mode='same')
return outputwhich means it can just be an ArrayLike?
scipy-stubs/signal/_peak_finding.pyi
Outdated
| vector: npt.ArrayLike, | ||
| widths: npt.ArrayLike, | ||
| wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None, | ||
| max_distances: Sequence[int] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a bit weird, but an ndarray isn't assignable to a Sequence, which the docs say that this should accept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thats a bit weird/annoying. The reason why I chose Sequence is because I didn't look at the docs for find_peaks_cwt (because it is a bit long) but instead for the function that max_distances gets passed to which is called _identify_ridge_lines which has says that max_distances is a 1-D sequence.
The source code for that function only calls len and indexes max_distances which fits the definition of Sequence. The values from max_distances gets compared to np.intp as well.
Would the type then be like onpt.Array[tuple[int], AnyInt]? That might still be too narrow considering that Sequence is a relatively weak type constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with Sequence is that apart from __getitem__ it also has several other methods like .count() and .index() that aren't present in np.ndarray 🤷🏻
And yea I agree that onpt.Array (or some other direct np.ndarray alias) would be indeed too narrow. So you could use e.g. _ArrayLikeInt_co, which is compatible with both Sequence[int] and npt.NDArray[integer[Any]]. And as far as I'm concerned, the fact that it would also allow integer scalars isn't much of a problem, as we're talking about the input of a function.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| window_size: Untyped | None = None, | ||
| ) -> Untyped: ... | ||
| window_size: int | None = None, | ||
| ) -> npt.NDArray[np.intp]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're feeling up to it, you could narrow this to a 1-D array as e.g. optype.numpy.Array[tuple[int], np.intp] (but it's also perfectly fine if you leave it like this).
scipy-stubs/signal/_peak_finding.pyi
Outdated
| _Mode: TypeAlias = Literal["clip", "wrap"] | ||
| _ProminencesResult: TypeAlias = tuple[npt.NDArray[np.float64], npt.NDArray[np.intp], npt.NDArray[np.intp]] | ||
| _WidthsResult: TypeAlias = tuple[ | ||
| npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64] | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen quite a lot of npt.NDArray[np.int64] and npt.NDArray[np.inpt] being used in this module, so perhaps it could help to introduce type aliases for these two as well
Specifically for `_peak_finding.pyi`
For `_peak_finding.pyi`.
|
Thanks for the feedback! The PR wasn't really complete because I'm not too well versed in all of the typing available, so working on this repo is really helping me learn about the possibilities. I'll try to fix some of the issues although I think there are still a few hangups:
|
In `_peak_finding.pyi`
In `_peak_finding.pyi`.
In `_peak_finding.pyi`.
In `_peak_finding.pyi` use them instead of `float` and `int` respectively. This is to allow usage of numpy scalars like `float16` and `int32`.
In `_peak_finding.pyi`. The `wavelet` parameter does not need to return `NDArray[np.float64]` it only needs to return something that can be coerced into a numpy array.
That's completely understandable; it took me quite a while to figure out how to properly type abstract things like an "array-like" and a "dtype-like".
|
After experimenting a bit with it, it looks like it is allowed to return both
So tldr; I guess the signature is best described as an overloaded one, that looks a bit like
(the |
You could try it out for a bit in a (i)python console or something, or you could play it safe and use
As these can be scalar, ndarray or a sequence, it's basically an "array-like". |
In `_peak_finding.pyi`'s `find_peaks` instead of having separate type annotations for scalars and NDArrays that have a concrete dtype, use numpy's _ArrayLike*_co type aliases to cover both cases and allow for various dtypese
scipy-stubs/signal/_peak_finding.pyi
Outdated
| widths: npt.ArrayLike, | ||
| wavelet: Callable[Concatenate[int, float, ...], npt.ArrayLike] | None = None, | ||
| max_distances: Sequence[int] | None = None, | ||
| gap_thresh: AnyInt | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be a float, according to the docs
scipy-stubs/signal/_peak_finding.pyi
Outdated
| vector: npt.ArrayLike, | ||
| widths: npt.ArrayLike, | ||
| wavelet: Callable[Concatenate[int, float, ...], npt.NDArray[np.float64]] | None = None, | ||
| max_distances: Sequence[int] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with Sequence is that apart from __getitem__ it also has several other methods like .count() and .index() that aren't present in np.ndarray 🤷🏻
And yea I agree that onpt.Array (or some other direct np.ndarray alias) would be indeed too narrow. So you could use e.g. _ArrayLikeInt_co, which is compatible with both Sequence[int] and npt.NDArray[integer[Any]]. And as far as I'm concerned, the fact that it would also allow integer scalars isn't much of a problem, as we're talking about the input of a function.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| ) -> Untyped: ... | ||
| vector: npt.ArrayLike, | ||
| widths: npt.ArrayLike, | ||
| wavelet: Callable[Concatenate[int, float, ...], npt.ArrayLike] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment on this signature
scipy-stubs/signal/_peak_finding.pyi
Outdated
| data: npt.NDArray[np.generic], | ||
| comparator: Callable[[npt.NDArray[np.generic], npt.NDArray[np.generic]], npt.NDArray[np.bool_]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally optional; but you could use a TypeVar(..., bound=np.generic) instead of the current np.generics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that require the comparator function to have two arguments of the same type? I think that in the ideal case the function signature makes sense but what if someone were to pass a function that takes in a float and an int? They would still be able to do the comparison despite the types being different.
Assuming that using TypeVar enforces that the two argument types to be the same that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that require the comparator function to have two arguments of the same type?
Yes exactly.
I think that in the ideal case the function signature makes sense but what if someone were to pass a function that takes in a float and an int?
If someone would pass a function that accepts arrays whose scalar types are incompatible, then it would indeed not be allowed.
And that's a good thing, because it could prevent some potentially type-unsafe situations.
Assuming that using
TypeVarenforces that the two argument types to be the same that is.
Well, it kinda depends on what you mean with "the same".
Because in this case, we also need to take the relationship of NDArray and its scalar type parameter into account: It's covariant.
So that means that you're allowed to pass something like (NDArray[np.int32], NDArray[np.integer[Any]]) -> NDArray[np.bool_] as function if you also pass data: NDarray[np.int32].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. Just misunderstood the semantics. I can add this then.
scipy-stubs/signal/_peak_finding.pyi
Outdated
| def argrelmin( | ||
| data: npt.NDArray[np.generic], | ||
| axis: op.CanIndex = 0, | ||
| order: int = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that besides int, you could also pass some np.integer[Any] and np.bool_ to order. And coincidentally, scipy._typing.AnyInt is an alias for exactly that :)
There's also some other orders like this below
In `_peak_finding.pyi`
In `_peak_finding.pyi`.
I finally got around to understanding this signature and I agree that the overloaded ones are good. I'm not sure how to make a type alias like the one you suggested though. I think a simple way to annotate without having to use too many generics is This doesn't enforce that the wavelet second argument is of the same type as the scalar type of the array like but it should be broader right? |
|
Sorry was a bit busy the last couple of days so couldn't finish off the PR. |
|
Actually on the topic of wavelet_data = np.conj(wavelet(N, width, **kwargs)[::-1])so I think it expects that |
For the function `find_peaks_cwt` in `_peak_finding.pyi`.
I also don't think we need to write overloads for this as well although it maybe more accurate. I think the spirit of the wavelet function is that it should be able to handle float values as well because it represents a continuous mathematical function. Following this interpretation as well, the first argument which corresponds to the number of points should be something like _WaveletFunction: TypeAlias = Callable[Concatenate[AnyInt, AnyReal, ...], npt.NDArray[np.generic]]with the only caveats being that the source actually passes in |
For the `argrelextrema` function in `_peak_finding.pyi`.
Yes that seems about right. Just keep in mind that
Yes; the parameters of a |
Good catch! I then just got lucky when I tried it with a |
Found in `_peak_finding.pyi`. Changes are:
1. Use `_ArrayLikeFloat_co` for `widths` parameter
This is because the documents specify `float or sequence` so the
dtype is implied to be float.
2. Change type annotation for `max_distances` from `_ArrayLikeInt_co` to
`_ArrayLikeFloat_co`. In case of `None` the array is given by the
`widths` parameter divided by 4.0 so it is has to be compatible with
float dtypes.
3. Change the output for `_WaveletFunction` to be more accurate. The
output has to be sliceable with the type of the sliced access being
an array like.
Found in `_peak_finding.pyi`. Change the type annotation for the `vector` parameter from `ArrayLike` to `NDArray[generic]`. The docs specify the input as `ndarray` not an array like and as there is no conversion in the source code, we can't use `ArrayLike`.
|
I added the following changes:
|
|
Great! Can I merge this? |
sorry just found a mistake. I think |
The docs aren't always accurate I have found, so be sure to check that first. But in case they are, then I guess that |
|
Yeah I actually was a bit confused with the docs because it says |
In `_peak_finding.pyi` and the function `find_peaks_cwt`. According to docs this is an NDArray not an ArrayLike.
|
Should be good to merge now after CI! |
|
Thanks @pavyamsiri ! |
I have added type stubs to
scipy/signal/_peak_finding.pyialthough I think the type annotations could be improved a bit.In particular:
npt.NDArray[np.generic]to denote that the argument must benp.ndarraybut the inner type is not relevant. Is there a better type for that?np.asarraybut some also require that two arraylike arguments have the same shape. Is it possible to write this requirement as a type?TypedDictbut because the presence of the keys are dependent on whether the input arguments to the function are None or not, all values must beNotRequiredbut I feel like this makes the typed dict a bit useless. Making overloads is possible I guess but that would be combinatorically bad.Open to any improvements and suggestions.