Skip to content

DataFrame/Series.tz_convert with copy=False modifies original data #6326

@hendrics

Description

@hendrics

Hi. Not sure if it is a bug, or something which needs to be clarified.

Consider the code

s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))
s.tz_convert("UTC", copy=False)

s index is still the same as before. If i do the same for frames.

d = pd.DataFrame(s)
d.tz_convert("UTC", copy=False)

This time index of d has changed. From the code it is not clear if DataFrame is doing the right thing either.

So is it a bug or is it just inconsistent, or is it an intention?

Update:

In [1]: s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))
      s.tz_convert("UTC", copy=False)
Out[1]: 
2013-10-26 22:00:00+00:00    0
2013-10-26 23:00:00+00:00    1
2013-10-27 00:00:00+00:00    2
2013-10-27 01:00:00+00:00    3
2013-10-27 02:00:00+00:00    4
Freq: H, dtype: int32

In [2]: s
Out[2]: 
2013-10-27 00:00:00+02:00    0
2013-10-27 01:00:00+02:00    1
2013-10-27 02:00:00+02:00    2
2013-10-27 02:00:00+01:00    3
2013-10-27 03:00:00+01:00    4
Freq: H, dtype: int32

In [3]: d = pd.DataFrame(s)
      d.tz_convert("UTC", copy=False)
Out[3]: 
2013-10-26 22:00:00+00:00 0
2013-10-26 23:00:00+00:00 1
2013-10-27 00:00:00+00:00 2
2013-10-27 01:00:00+00:00 3
2013-10-27 02:00:00+00:00 4

In [214]: d
Out[214]: 
2013-10-26 22:00:00+00:00 0
2013-10-26 23:00:00+00:00 1
2013-10-27 00:00:00+00:00 2
2013-10-27 01:00:00+00:00 3
2013-10-27 02:00:00+00:00 4

Activity

jreback

jreback commented on Feb 12, 2014

@jreback
Contributor

looks ok in 0.13.1..

In [1]: pd.__version__
Out[1]: '0.13.1'

In [2]: s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))

In [3]: s.tz_convert("UTC", copy=False)
Out[3]: 
2013-10-26 22:00:00+00:00    0
2013-10-26 23:00:00+00:00    1
2013-10-27 00:00:00+00:00    2
2013-10-27 01:00:00+00:00    3
2013-10-27 02:00:00+00:00    4
Freq: H, dtype: int64

In [4]: d = pd.DataFrame(s)

In [5]: d.tz_convert("UTC", copy=False)
Out[5]: 
                           0
2013-10-26 22:00:00+00:00  0
2013-10-26 23:00:00+00:00  1
2013-10-27 00:00:00+00:00  2
2013-10-27 01:00:00+00:00  3
2013-10-27 02:00:00+00:00  4

[5 rows x 1 columns]
alexchamberlain

alexchamberlain commented on Feb 12, 2014

@alexchamberlain

If you inspect s. it hasn't changed under 0.13.1.

>>> s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))
>>> s.tz_convert(None, copy=False)
2013-10-26 22:00:00    0
2013-10-26 23:00:00    1
2013-10-27 00:00:00    2
2013-10-27 01:00:00    3
2013-10-27 02:00:00    4
Freq: H, dtype: int32
>>> s
2013-10-27 00:00:00+02:00    0
2013-10-27 01:00:00+02:00    1
2013-10-27 02:00:00+02:00    2
2013-10-27 02:00:00+01:00    3
2013-10-27 03:00:00+01:00    4
Freq: H, dtype: int32
>>> 
jreback

jreback commented on Feb 12, 2014

@jreback
Contributor

why would you expect s to change? most pandas methods return a new object

the copy flag is just tries not to actually copy the index if it doesn't need to; in this case it does so its irrelevant

hendrics

hendrics commented on Feb 12, 2014

@hendrics
Author

DataFrame does change though. If you inspect d it will have a new index. It might be that index is part of the data in the DataFrame, it's just the behaviour is inconsistent.

Updated the comment above with the output.

jreback

jreback commented on Feb 12, 2014

@jreback
Contributor

ahh..ok....will mark as a bug...thanks for the report

added this to the 0.14.0 milestone on Feb 12, 2014
jreback

jreback commented on Feb 18, 2014

@jreback
Contributor

@hendrics

pls run this again on master....I am pretty sure this is fixed (and if you want to add an explicty test for this, would be gr8)

see here: b1687b8

modified the milestones: 0.15.0, 0.14.0 on Apr 9, 2014
modified the milestones: 0.16.0, Next Major Release on Mar 3, 2015
modified the milestone: Contributions Welcome on Jul 8, 2018
changed the title [-]tz_convert with copy=False behaves differently and unexpectedly for Series and DataFrame[/-] [+]DataFrame.tz_convert with copy=False modifies original data[/+] on Jul 26, 2018
mroeschke

mroeschke commented on Jan 4, 2019

@mroeschke
Member

The Series case is actually wrong now as well.

In [13]: s = pd.Series(np.arange(0,5), index=pd.date_range('20131027', periods=5, freq='1H', tz='Europe/Berlin'))

In [14]: s
Out[14]:
2013-10-27 00:00:00+02:00    0
2013-10-27 01:00:00+02:00    1
2013-10-27 02:00:00+02:00    2
2013-10-27 02:00:00+01:00    3
2013-10-27 03:00:00+01:00    4
Freq: H, dtype: int64

In [15]: s.tz_convert('UTC', copy=False)
Out[15]:
2013-10-26 22:00:00+00:00    0
2013-10-26 23:00:00+00:00    1
2013-10-27 00:00:00+00:00    2
2013-10-27 01:00:00+00:00    3
2013-10-27 02:00:00+00:00    4
Freq: H, dtype: int64

In [16]: s
Out[16]:
2013-10-26 22:00:00+00:00    0
2013-10-26 23:00:00+00:00    1
2013-10-27 00:00:00+00:00    2
2013-10-27 01:00:00+00:00    3
2013-10-27 02:00:00+00:00    4
Freq: H, dtype: int64

In [17]: pd.__version__
Out[17]: '0.24.0.dev0+1505.gcb31b2b09.dirty'
changed the title [-]DataFrame.tz_convert with copy=False modifies original data[/-] [+]DataFrame/Series.tz_convert with copy=False modifies original data[/+] on Jan 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @jreback@hendrics@alexchamberlain@datapythonista@mroeschke

      Issue actions

        DataFrame/Series.tz_convert with copy=False modifies original data · Issue #6326 · pandas-dev/pandas