Closed
Description
ser = pd.Series([1, 2, 3, 4], index=[1.1, 2.1, 3.0, 4.1])
>>> ser[5] # <--we are treating 5 as a label
KeyError: 5
>>> ser[5] = 5 # < --we are treating 5 as position
IndexError: index 5 is out of bounds for axis 0 with size 4
>>> ser[3] = 5 # <-- we are treating 3 as a label
>>> ser
1.1 1
2.1 2
3.0 5
4.1 4
dtype: int64
The ser[5] = 5
case is an outlier because instead of having its label-vs-positional behavior determined by ser.index._should_fallback_to_positional
, it is defermined by if is_integer(key) and not self.index.inferred_type == "integer":
Activity
jorisvandenbossche commentedon Sep 7, 2020
This API is certainly a mess .. ;)
The proposal is to change the setitem case where it fallbacks to positional to also be strictly label-based?
But so the integer would also be interpreted as label, and thus do enlargement in case it is not yet present?
Using your example series:
I was first thinking this would be a relatively safe change, since it right now already tries label-based for setitem first (so if the integer key is present as a label, it would already do label-based now). So the potential difference in behaviour is only when the label is not present. Right now that can update a value, but in the future add another value like the above? Which is kind of silently resulting in different data. We could also deprecate integer keys for setitem first or raise an error for it?
jbrockmendel commentedon Sep 8, 2020
Do you mean "deprecate ever treating integer as a label for Float64Index"? Or "deprecate allowing ser[integer] with Float64Index"? The first seems reasonable.
I'd be OK with "integers are always considered labels" or "integers are always considered positional", so there is no question of fall-back behavior.
jbrockmendel commentedon Jun 9, 2021
Options that come to mind for fixing this:
Deprecate casting ints to floats in Float64Index.get_loc, so that
ser[4]
is always positional when we have a Float64Index.Change Float64Index._should_fallback_to_positional to
return not (self == self.astype(int)).any()
(ATM it always returns False).Series.__setitem__
vsSeries.__getitem__
Change Float64Index._should_fallback_to_positional to always return True.
jorisvandenbossche commentedon Jun 10, 2021
Currently I think
ser[4]
is always strictly label-based for getitem? Since that part is actually consistent / unambiguous (if you know the semantics ..) at this moment, I would maybe leave it as is. Or if we deprecate it, have it raise an error in the future instead of being positional.For that last reason (value-dependent behavior), I wouldn't go for this.
ser[2]
meaning something different forpd.Series([..], index=[1.1, 2.0])
vspd.Series([..], index=[1.0, 2.0])
seems quite confusing or surprising.What's the behavioral consequence of this? So that if the key (cast to float) is not present, always use positional?
Is a 4th option to change Float64Index._should_fallback_to_positional to always return False? (like it is for Int64Index?)
jbrockmendel commentedon Jun 10, 2021
Note: in all the options described above, we are changing in
Series.__setitem__
If we do that, and add in
__getitem__
a clause so that we try label-based before falling back, then we break 5 tests e.g.(without the extra clause in
__getitem__
we break 9 tests, including a few that aren'twith pytest.raises(...)
)That is the status quo for _should_fallback_to_positional. If we change the line in
Series.__setitem__
and leave everything else unchanged, we break 1 test (huh, thought it was more:jreback commentedon Jun 24, 2021
IIUC I think integers should always be label based for FloatIndex. so this means deprecating (and then changing) the setitem behavior.
getitem would be unchanged i think.