You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! Hijacking this issue as I've also verified this behaviour (actually, it took a while to discover after upgrading to 0.19.0 and discovering some odd dropping of timezones - see #14524, which is a duplication of #13905). This behaviour was masked to my program previously as Pandas 0.18.1 was dropping the timezones from all relevant columns before I tried to perform this step. Once upgrading to 0.19.0 half the operations I was performing stopped dropping timezones, leading to mismatch between tz-aware and tz-naive timestamps which I've been chasing down the rabbit hole for a couple of days now.
I've verified that this is present in pandas 0.18.1 and 0.19.0.
From some stepping through of the code, this looks like a potential problem with the numpy implementations of .max(axis=1), but I haven't yet found the culprit!
This issue has meant that I've been forced to roll back to 0.18.1 to use the drop timezone bug in order to make the df.max(axis=1) work, which is frustrating! I have also tried a df.T.max() to work around the issue, but this infuriatingly returns an empty series (see below).
A small, complete example of the issue
importpandasaspddf=pd.DataFrame(pd.date_range(start=pd.Timestamp('2016-01-01 00:00:00+00'), end=pd.Timestamp('2016-01-01 23:59:59+00'), freq='H'))
df.columns= ['a']
df['b'] =df.a.subtract(pd.Timedelta(seconds=60*60)) # if using pandas 0.19.0 to test, ensure that this is a series of timedeltas instead of a single - we want b and c to be tz-naive.df[['a', 'b']].max() # This is fine, produces two numbersdf[['a', 'b']].max(axis=1) # This is not fine, produces a correctly sized series of NaNdf['c'] =df.a.subtract(pd.Timedelta(seconds=60)) # if using pandas 0.19.0 to test, ensure that this is a series of timedeltas instead of a single - we want b and c to be tz-naive.df[['b', 'c']].max(axis=1) # This is fine, produces correctly sized series of valid timestamps without timezonedf[['a', 'b']].T.max() # produces an empty series.
Expected Output
Calling df.max(axis=1) on a dataframe with timezone-aware timestamps should return valid timestamps, not NaN.
Output of pd.show_versions()
(I have tested in two virtualenvs, the only difference between the two being the pandas version)
changed the title [-]Unexpected behavior on df_datetime64.max(axis=1) with missing column [/-][+]BUG: DataFrame with tz-aware data and max(axis=1) returns NaN[/+]on Oct 21, 2018
Activity
jreback commentedon Jun 19, 2015
pls show pd.show_versions() and df_datetime64.info()
TimTimMadden commentedon Oct 28, 2016
Hello! Hijacking this issue as I've also verified this behaviour (actually, it took a while to discover after upgrading to 0.19.0 and discovering some odd dropping of timezones - see #14524, which is a duplication of #13905). This behaviour was masked to my program previously as Pandas 0.18.1 was dropping the timezones from all relevant columns before I tried to perform this step. Once upgrading to 0.19.0 half the operations I was performing stopped dropping timezones, leading to mismatch between tz-aware and tz-naive timestamps which I've been chasing down the rabbit hole for a couple of days now.
I've verified that this is present in pandas 0.18.1 and 0.19.0.
From some stepping through of the code, this looks like a potential problem with the numpy implementations of
.max(axis=1)
, but I haven't yet found the culprit!This issue has meant that I've been forced to roll back to 0.18.1 to use the drop timezone bug in order to make the
df.max(axis=1)
work, which is frustrating! I have also tried adf.T.max()
to work around the issue, but this infuriatingly returns an empty series (see below).A small, complete example of the issue
Expected Output
Calling
df.max(axis=1)
on a dataframe with timezone-aware timestamps should return valid timestamps, not NaN.Output of
pd.show_versions()
(I have tested in two virtualenvs, the only difference between the two being the pandas version)
Paste the output here
INSTALLED VERSIONS
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 28.6.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
[-]Unexpected behavior on df_datetime64.max(axis=1) with missing column [/-][+]BUG: DataFrame with tz-aware data and max(axis=1) returns NaN[/+]