Open
Description
Python's documentation promises that: "The only required property is that objects which compare equal have the same hash value…" However, NumPy dtypes do not follow this requirement. As discussed in numpy/numpy#7242, dtype objects, their types, and their names all compare equal despite hashing unequal. Could the Array API promise that this will no longer be the case?
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
rgommers commentedon Dec 23, 2022
That seems fine to me to explicitly specify.
float32 == 'float32'
should clearly return False. In NumPy it's a bit messy:Only the first example is relevant for the array API standard, so I think this will be fine to specify since NumPy already complies.
This one there is a problem in NumPy however:
That can be considered a clear bug though, should be fixed in NumPy.
NeilGirdhar commentedon Dec 23, 2022
So you're saying that
np.dtype(np.float32) == 'float32'
will be true or false?Agreed.
What about
np.float32 == np.dtype(np.float32)
?This also violates Python's hashing invariant.
rgommers commentedon Dec 23, 2022
I agree that it's a bug technically. Not 100% sure that the NumPy team will want that changed, but I hope so (and a proposal for a major release is in the works, so that could go into it). For the array API standard it's not an issue, because there is no
dtype
constructor/function in the main namespace.That's more for the NumPy issue tracker, but if it were up to me then yes.
For this issue tracker, I'm +1 on adopting language in the standard like: "All objects in this standard must adhere to the following requirement (as required by Python itself): objects which compare equal have the same hash value".
NeilGirdhar commentedon Dec 23, 2022
That would be amazing. That's exactly what I was hoping for.
Okay, thanks for explaining. If the above language were adopted, NumPy could implement that by making
xp.float32
not simply equal tonp.dtype(np.float32)
, but rather a special dtype object that doesn't have the pernicious behavior.rgommers commentedon Dec 23, 2022
Let's give it a bit of time to see if anyone sees a reason not to add such a requirement. I can open a PR after the holidays.
NeilGirdhar commentedon Dec 25, 2022
Just noticed this comment. It is currently an issue in NumPy's implementation of the Array API:
This is because
xp.float32
points to an objectnp.dtype(np.float32)
. For this to be fixed, NumPy would just need a new dtype class for use in its Array APIxp
.With the language you suggested above, NumPy would be forced to do this to become compliant 😄 .
Same thing here, I think. NumPy will probably reject this for their own namespace (
np
), but if you adopt that language, they would have to fix it in the array API (xp
).Incidentally, I assume you want
numpy.array_api.float32
to compare equal tojax.array_api.float32
? Since there is no root project to provide a base implementation of dtypes, you may need to standardize howdtype.__hash__
and comparison work.rgommers commentedon Dec 26, 2022
There is no
float32.type
in the standard. That it shows up withnumpy.array_api.float32
is because the dtype objects there are aliases to the regular numpy ones, rather than new objects. That was a shortcut I think, because adding new dtypes is a lot of work. So that's one place where currentlynumpy.array_api
doesn't 100% meet its goal of being completely minimal.No, definitely not. No objects from two different libraries should ever compare equal, unless they're indeed the same object.
NeilGirdhar commentedon Dec 26, 2022
Ok! Thanks for explaining.
So to do things like checking that two arrays have the same dtype, or creating a NumPy array that has the same type as a Jax array, we'll need mappings like:
And code like
is impossible, yes? You need:
rgommers commentedon Dec 26, 2022
Having to use library-specific constructs should not be needed - if so, we're missing an API I'd say. More importantly: mixing arrays from different libraries like this is a bit of an anti-pattern. You can't do much with that, neither library has kernels for functions that use both array types, so you're probably relying on implicit conversion of one to the other.
So in this case, let me assume that
x
is a numpy array,y
a JAX array and you're wanting to use functions fromx
(numpy):yes indeed
I'm actually a little surprised JAX accepts numpy arrays. It seems to go against its philosophy; TensorFlow, PyTorch and CuPy will all raise. When you call
jnp.xxx(a_numpy_array)
, JAX will also make a copy always I believe, since it doesn't want to share memory. An explicit copy made by the user is clearer and more portable.JAX is also annotating its array inputs as
array_like
, but it doesn't mean the same as for NumPy:All this stuff is bug-prone:
NeilGirdhar commentedon Dec 26, 2022
Okay, makes sense. I haven't been very conscious about this because (as you pointed out) Jax implicitly converts. I will be more careful.
I think this is where I'm confused. Somehow numpy has to know what its equivalent dtypes are for Jax's dtypes even though they don't compare equal? Or will it produce a numpy array with a Jax dtype? As this seems to work:
Very interesting. I wonder what the Jax team would say.
rgommers commentedon Dec 26, 2022
NumPy knows the dtype, as does JAX. This conversion uses the Python buffer protocol or DLPack, both of which are protocols explicitly meant for exchanging data in a reliable way (that includes dtype, shape, endianness, etc.). So the
asarray
call will produce a numpy array with a numpy dtype, and to do so numpy does not need to know anything specifically about JAX.Let's try to find out:) This section of the JAX docs only explains why JAX doesn't accept list/tuple/etc., but I cannot find an explanation of why it does accept numpy arrays and scalars. @shoyer or @jakevdp, would you be able to comment on why JAX implements a limited form of "array-like"?
Also, in addition to bug with masked arrays above, here is another bug:
NeilGirdhar commentedon Dec 26, 2022
In that case, there should be a way to convert dtypes using both the buffer protocol or DLPack? Something more efficient than:
Should dtypes have a
__array_namespace__
attribute? Currently, they don't. So, the above function can't be written unless you knowxp
.rgommers commentedon Dec 26, 2022
No, those protocols are specifically for exchanging data (strided arrays/buffers). A dtype without data isn't very meaningful. You could exchange a size-1 array if needed, or a
'float32'
string representation, or whatever works.NeilGirdhar commentedon Dec 26, 2022
I understand, but in order to exchange data, they have to be able to convert dtypes. So, that dtype conversion is happening somehow, and I was just wondering if that conversion can be accessed by the user.
29 remaining items