Skip to content
This repository was archived by the owner on Jun 10, 2020. It is now read-only.
This repository was archived by the owner on Jun 10, 2020. It is now read-only.

Balancing correctness vs. type sanity #12

Closed
@shoyer

Description

@shoyer

#11 surfaced an important issue to decide: should our type annotations prioritize correctness or type sanity/friendliness.

I wrote:

Yes, this sort of behavior is very unfortunate for us, and unfortunately is also quite prevalent in NumPy. My inclination is that our type-stubs should prioritize correctness more than catching all possible errors. Zero-dimensional arrays are somewhat usual to see in NumPy, but they do come up. Without typing support for array dimensionality, this would mean that many of these return values should be unsatisfying Any types.

@alanhdu responded:

I'm willing to defer to your judgement here, but I'd personally lean the other way -- IME w/ MyPy is still limited enough that you almost never have "seamless" integration with any reasonably large codebase, and given that you have to adapt the coding style (to write "type-friendly" code) and do integration work anyways, I think it's reasonable to prioritize "correctness" (as long as the burden's not too much, like having to add some int and float casts for numpy scalars).

Here's an illustrative example. NumPy ufuncs like np.sin() usually return a numpy.ndarray when passed a numpy.ndarray as input. However, there are two notable exceptions:

  • This isn't true for 0-dimensional arrays: ufuncs (and most other numpy functions) return numpy scalar types instead.
  • It also isn't true if any arguments are ndarray subclasses: these can overload __array_ufunc__ (or other old override mechanisms) to return whatever they like.

In practice, both 0-dimensional arrays and ndarray subclasses (especially those that would violate this design) are rare compared to the dominant usecase, but these cases definitely do come up in large code-bases.

Neither of these exceptions are currently expressible with our typing system, so we can choose between:

  1. def sin(x: np.ndarray) -> Any (correct, but useless for type checking)
  2. def sin(x: np.ndarray) -> Union[np.generic, np.ndarray] (not worrying about ndarray subclasses -- you shouldn't write a subclass that violates NumPy's typing rules)
  3. def sin(x: np.ndarray) -> np.ndarray (not worrying about subclasses or 0-dimensional arrays)

(Note that this would be one of several overrides for ufuncs, which also should be defined for non-ndarrays.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions