Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashability of unit_quantity objects #143

Open
twmr opened this issue Mar 5, 2020 · 6 comments
Open

hashability of unit_quantity objects #143

twmr opened this issue Mar 5, 2020 · 6 comments

Comments

@twmr
Copy link
Contributor

twmr commented Mar 5, 2020

  • unyt version: latest
  • Python version: 3.7
  • Operating System: linux

Description

It is possible to compute the hash of unyt.unit_object.Unit, but not of unyt.array.unyt_quantity. Is there any reason for this?

In [1]: import unyt                                                                                                                  

In [2]: unyt.uC                                                                                                                      
Out[2]: µC

In [3]: a = 3*unyt.uC                                                                                                                

In [4]: hash(a)                                                                                                                      
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-57b555d30865> in <module>
----> 1 hash(a)

TypeError: unhashable type: 'unyt_quantity'

In [5]: hash(unyt.uC)                                                                                                                
Out[5]: 2174419604693315976

In astropy this is possible (but only of scalar unit objects)

In [1]: import astropy.units as u                                                         

In [2]: u.mm                                                                              
Out[2]: Unit("mm")

In [3]: [3, 4] *u.mm                                                                      
Out[3]: <Quantity [3., 4.] mm>

In [4]: hash([3, 4] *u.mm)                                                                
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-c0ef3160b5ce> in <module>
----> 1 hash([3, 4] *u.mm)

~/miniconda/envs/py37/lib/python3.7/site-packages/astropy/units/quantity.py in __hash__(self)
   1013     # other overrides of special functions
   1014     def __hash__(self):
-> 1015         return hash(self.value) ^ hash(self.unit)
   1016 
   1017     def __iter__(self):

TypeError: unhashable type: 'numpy.ndarray'

In [5]: hash(3 *u.mm)                                                                     
Out[5]: -8105555551701293509

What I don't like about astropy's hash implementation is that the hashes of numerically equivalent quantities are different

In [6]: hash(3 *u.mm)                                                                     
Out[6]: -8105555551701293509

In [7]: hash(3000 *u.um)                                                                  
Out[7]: -2837172712740361204
@ngoldbaum
Copy link
Member

This is happening because under the hood unyt_quantity instances are array scalars and array scalars can't be hashed either:

In [1]: import numpy as np

In [2]: data = np.array(1)

In [3]: data.shape
Out[3]: ()

In [4]: hash(data)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-43a48d057afe> in <module>
----> 1 hash(data)

TypeError: unhashable type: 'numpy.ndarray'

Fundamentally the problem is that unyt_quantity instances are mutable:

In [11]: from unyt import cm

In [12]: data = 3*cm

In [13]: id(data)
Out[13]: 140262494078568

In [14]: data
Out[14]: unyt_quantity(3, 'cm')

In [15]: data[...] = 2

In [16]: data
Out[16]: unyt_quantity(2, 'cm')

In [17]: id(data)
Out[17]: 140262494078568

I'm not sure we can change this without introducing a breaking change, although I don't know offhand if it would actually break any real-world code.

It's also a little weird to me that this code works, I don't think it's correct that astropy lets you hash a mutable object, or that the hash of an object can change by doing something to it.

In [37]: from astropy.units import cm

In [38]: data = 3*cm

In [39]: data
Out[39]: <Quantity 3. cm>

In [40]: id(data)
Out[40]: 140262435141784

In [41]: hash(data)
Out[41]: 4329836866475337641

In [42]: data[...] = 2 * cm

In [43]: id(data)
Out[43]: 140262435141784

In [44]: hash(data)
Out[44]: 4329836866475337640

In [45]: data
Out[45]: <Quantity 2. cm>

@mattwthompson
Copy link

We've come across this same use case by chance, and settled on converting values to tuple (although we are bending the rules by even intending to hash mutable objects)

https://github.com/mosdef-hub/gmso/blob/0515c665478150bb9f58a0e83f81af28868bd0c0/gmso/utils/misc.py#L4

@ngoldbaum
Copy link
Member

Some additional context about how hashing in general is difficult and what bad things happen when you allow hashing of mutable objects where equality for the object depends on the object's state:

https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/

And also some related context about why hashing numeric types in particular is also difficult:

sympy/sympy#11707

However, there is probably a way to make hashing of objects that have different units but are nonetheless equal have the same hashes by following Python's approach for numeric types: https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types

One way forward would be to force unyt_quantity instances to be read-only. You could check the impact of that by, for example, running the yt unit tests on the yt-4.0 branch with this change. If all the unit tests pass with that change then there's a good chance it's a low-impact change that we can safely make.

If not then another way forward would be to define a new class that represents an immutable scalar quantity, and a way to go from the mutable version to the immutable version when you want to generate hash keys.

@ngoldbaum
Copy link
Member

Or maybe just a helper function that returns a tuple of the quantity's value and unit, which are both individually immutable and can be used as dict keys when paired in a tuple.

@haoenz
Copy link

haoenz commented Dec 24, 2024

Or maybe just a helper function that returns a tuple of the quantity's value and unit, which are both individually immutable and can be used as dict keys when paired in a tuple.

While I don’t consider having two types with identical semantics but different implementations in the same package to be an elegant choice, it’s an acceptable compromise. So, is this approach currently implemented?

Also, the fact that unyt_quantity is a subclass of unyt_array feels a bit counterintuitive to me —— like being told that numpy.float64 is somehow a subclass of numpy.ndarray.

@neutrinoceros
Copy link
Member

So, is this approach currently implemented?

it is not.

Also, the fact that unyt_quantity is a subclass of unyt_array feels a bit counterintuitive to me —— like being told that numpy.float64 is somehow a subclass of numpy.ndarray.

numpy itself effectively treats 0d arrays as a special case of ndarray. I cannot say for sure that this was the reason behind this design choice in unyt but it seems a likely explanation. In any case, this is not something we could change now without a very good reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants