-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
BugRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas versionReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, ExplodeTimezonesTimezone data dtypeTimezone data dtype
Milestone
Description
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Code Sample
df = pd.DataFrame({'value': [0, 1, 2, 3]},
index=[pd.Timestamp('2020-01-01 05:00:00+0000', tz='UTC'),
pd.Timestamp('2020-01-01 06:00:00+0000', tz='UTC'),
pd.Timestamp('2020-01-01 07:00:00+0000', tz='UTC'),
pd.Timestamp('2020-01-01 08:00:00+0000', tz='UTC')
]
)
new_index = pd.Series([pd.Timestamp('2020-01-01 5:30:00+0000', tz='UTC'),
pd.Timestamp('2020-01-01 6:30:00+0000', tz='UTC'),
pd.Timestamp('2020-01-01 7:30:00+0000', tz='UTC'),
pd.Timestamp('2020-01-01 8:30:00+0000', tz='UTC'),
pd.Timestamp('2020-01-01 9:30:00+0000', tz='UTC')]
)
new_df = df.reindex(new_index, method="ffill", tolerance=pd.Timedelta("1 hour"))
Problem description
The following exception is raised when method
is "ffill"
and "bfill"
but not "nearest"
(see #32740) AND tolerance
is specified
TypeError: DatetimeArray subtraction must have the same timezones or no timezones
I found the timezone was dropped when reaching this function on lines 3024 and 3036
pandas/pandas/core/indexes/base.py
Lines 3024 to 3036 in b5958ee
target_values = target._get_engine_target() | |
if self.is_monotonic_increasing and target.is_monotonic_increasing: | |
engine_method = ( | |
self._engine.get_pad_indexer | |
if method == "pad" | |
else self._engine.get_backfill_indexer | |
) | |
indexer = engine_method(target_values, limit) | |
else: | |
indexer = self._get_fill_indexer_searchsorted(target, method, limit) | |
if tolerance is not None: | |
indexer = self._filter_indexer_tolerance(target_values, indexer, tolerance) |
where target
is the target index that's tz-aware. However once converted to target_values
, the tz info disappears from the numpy array.
I found a working solution but unsure if this behavior affects any other parts functionalities
- self._filter_indexer_tolerance(target_values, indexer, tolerance)
+ self._filter_indexer_tolerance(target, indexer, tolerance)
Expected Output
value
2020-01-01 05:30:00+00:00 0.0
2020-01-01 06:30:00+00:00 1.0
2020-01-01 07:30:00+00:00 2.0
2020-01-01 08:30:00+00:00 3.0
2020-01-01 09:30:00+00:00 NaN
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : b5958ee1999e9aead1938c0bba2b674378807b3d
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.0-11-cloud-amd64
Version : #1 SMP Debian 4.19.146-1 (2020-09-17)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.5
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : 6.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.4
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.3
sqlalchemy : 1.3.20
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.2
Metadata
Metadata
Assignees
Labels
BugRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas versionReshapingConcat, Merge/Join, Stack/Unstack, ExplodeConcat, Merge/Join, Stack/Unstack, ExplodeTimezonesTimezone data dtypeTimezone data dtype
Activity
ketozhang commentedon Dec 18, 2020
A simpler example
simonjayhawkins commentedon Dec 18, 2020
Thanks @ketozhang for the report.
pandas-0.25.3 was giving the expected output, so will label as regression pending further investigation.
ketozhang commentedon Jan 11, 2021
Great work, thanks @phofl