Skip to content

BUG: DataFrame with tz-aware data and max(axis=1) returns NaN #10390

Closed
@ewan1983

Description

@ewan1983

I have a dataframe looks like this, and its column 2 is missing:
image

When I try to select the max date in each row, I got all NaN in return:
img2

However, If the dataframe's type is float64, the selection work as expected.

Activity

jreback

jreback commented on Jun 19, 2015

@jreback
Contributor

pls show pd.show_versions() and df_datetime64.info()

TimTimMadden

TimTimMadden commented on Oct 28, 2016

@TimTimMadden

Hello! Hijacking this issue as I've also verified this behaviour (actually, it took a while to discover after upgrading to 0.19.0 and discovering some odd dropping of timezones - see #14524, which is a duplication of #13905). This behaviour was masked to my program previously as Pandas 0.18.1 was dropping the timezones from all relevant columns before I tried to perform this step. Once upgrading to 0.19.0 half the operations I was performing stopped dropping timezones, leading to mismatch between tz-aware and tz-naive timestamps which I've been chasing down the rabbit hole for a couple of days now.

I've verified that this is present in pandas 0.18.1 and 0.19.0.

From some stepping through of the code, this looks like a potential problem with the numpy implementations of .max(axis=1), but I haven't yet found the culprit!

This issue has meant that I've been forced to roll back to 0.18.1 to use the drop timezone bug in order to make the df.max(axis=1) work, which is frustrating! I have also tried a df.T.max() to work around the issue, but this infuriatingly returns an empty series (see below).

A small, complete example of the issue

import pandas as pd
df = pd.DataFrame(pd.date_range(start=pd.Timestamp('2016-01-01 00:00:00+00'), end=pd.Timestamp('2016-01-01 23:59:59+00'), freq='H'))
df.columns = ['a']

df['b'] = df.a.subtract(pd.Timedelta(seconds=60*60)) # if using pandas 0.19.0 to test, ensure that this is a series of timedeltas instead of a single - we want b and c to be tz-naive.

df[['a', 'b']].max() # This is fine, produces two numbers

df[['a', 'b']].max(axis=1) # This is not fine, produces a correctly sized series of NaN

df['c'] = df.a.subtract(pd.Timedelta(seconds=60)) # if using pandas 0.19.0 to test, ensure that this is a series of timedeltas instead of a single - we want b and c to be tz-naive.

df[['b', 'c']].max(axis=1) # This is fine, produces correctly sized series of valid timestamps without timezone

df[['a', 'b']].T.max() # produces an empty series.

Expected Output

Calling df.max(axis=1) on a dataframe with timezone-aware timestamps should return valid timestamps, not NaN.

Output of pd.show_versions()

(I have tested in two virtualenvs, the only difference between the two being the pandas version)

Paste the output here

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: None.None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 28.6.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

changed the title [-]Unexpected behavior on df_datetime64.max(axis=1) with missing column [/-] [+]BUG: DataFrame with tz-aware data and max(axis=1) returns NaN[/+] on Oct 21, 2018
added this to the Contributions Welcome milestone on Oct 23, 2018
modified the milestones: Contributions Welcome, 0.24.0 on Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Participants

      @jreback@TimTimMadden@ewan1983@mroeschke

      Issue actions

        BUG: DataFrame with tz-aware data and max(axis=1) returns NaN · Issue #10390 · pandas-dev/pandas