-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Milestone
Description
I'm trying to use pandas to speed up timezone conversions on many datetimes and I can't get around NonExistentTimeErrors
. Pandas tz_localize()
seems to ignore the ambiguous
argument for the non existent time case.
Example:
import pytz
import pandas
tz = pytz.timezone('Europe/Warsaw')
non_existent = datetime.datetime(2015, 3, 29, 2, 30)
tz.normalize(tz.localize(non_existent))
#2015-03-29 03:30:00+02:00
tz.normalize(tz.localize(non_existent, is_dst=False))
#2015-03-29 03:30:00+02:00
pandas.Timestamp(non_existent).tz_localize(tz)
# NonExistentTimeError: 2015-03-29 02:30:00
pandas.Timestamp(non_existent).tz_localize(tz, ambiguous=0)
# NonExistentTimeError: 2015-03-29 02:30:00
It would be nice if the ambiguous
argument worked the same as is_dst
in pytz. As it is now, it's impossible afaik to reliably localize a series of datetimes using pandas, since any of them might cause a NonExistentTimeError
and there's no way of guiding pandas what to do with such datetimes.
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.15.1
nose: 1.3.4
Cython: None
numpy: 1.9.1
scipy: None
statsmodels: None
IPython: 2.3.1
sphinx: 1.2.2
patsy: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9
apiclient: 1.3.1
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: 2.5.4 (dt dec pq3 ext)
nicodv and TATABOX42
Activity
jorisvandenbossche commentedon Nov 28, 2014
cc @rockg @ischwabacher
rockg commentedon Nov 28, 2014
ambiguous
is only structured to differentiate duplicate times in the fall transition. It would be very easy to extend to the spring transition.rockg commentedon Nov 30, 2014
@mskrajnowski I have been thinking about this more and wonder if your data is really in a standard time zone and you should be localizing to that and then converting. For example, what does the data look like for that Spring DST change day? Are there 24 hours or just 23? Maybe the localizing issue is masking something else.
ischwabacher commentedon Dec 1, 2014
It seems to me that the only overlap between the sorts of options that make sense for handling ambiguous times and the options that make sense for nonexistent times are
NaT
andraise
. Here are the other possible behaviors I can come up with, though some of them are pretty wacky:Timestamp('2014-03-09 02:30:00').tz_localize('America/Chicago')
isTimestamp('2014-03-09 03:00:00-0500', tz='America/Chicago')
.pytz
sense). SoTimestamp('2014-03-09 02:30:00').tz_localize('America/Chicago')
becomesTimestamp('2014-03-09 03:30:00-0500', tz='America/Chicago')
.date_range
to be emulated by repeated subtraction of an offset from aTimestamp
. (So that for instanceTimestamp('2014-03-10 02:30:00-0500', tz='America/Chicago') - Day() - Day()
could equalTimestamp('2014-03-08 02:30:00-0600', tz='America/Chicago')
. But this is a lot of crazy complexity for an invariant that I'm starting to think is a bad idea anyway.How do these match up against the options for ambiguous time handling? Does the knob for nonexistent time handling need to be separate from the one for ambiguous times?
mskrajnowski commentedon Dec 3, 2014
@rockg I'm implementing a scheduling application. The user defines working hours in his/her own timezone, then I'm combining the time provided by the user with dates, to get actual work start and end utc datetimes. I can't really forbid the user from setting 02:30 as his work start/end time (maybe he/she likes to wake up in the middle of the night and code ;) ), so I need a way to reliably localize any datetime. Even if technically a given time doesn't exist, I still need to output some logical utc datetime.
@ischwabacher I'd go with the way
pytz
handles non existent/ambiguous time.ischwabacher commentedon Dec 3, 2014
Unfortunately,
pytz
usesis_dst=True
to mean option 5,is_dst=False
to mean option 4, andis_dst=None
to meanraise
. This is partly due topytz
's workaround for limitations in thedatetime
API, which are tentatively scheduled to be fixed in python3.5 (woohoo!), so we will probably seepytz
switch to behaviors 3 and 2, respectively.One issue I have here is that
is_dst=True
returns a time that (once normalized) is not DST but would be the given time if it were DST, whileis_dst=False
returns one that is DST but would be the given time if it were not DST. I am not sure whether this is more or less confusing than swapping them.Also, if your users set a time of 2:30 as a work start/end time, do you think your users will be least surprised by an alarm at 1:30, 3:00, or 3:30? Does it depend on whether it's a start or end time?
mskrajnowski commentedon Dec 3, 2014
Since 2:00 becomes 3:00 in that transition, imho it's logical that 2:30 would become 3:30. That's what I get with
pytz
usinglocalize
andnormalize
(without passingis_dst
). Of course, now the question is, what to do with a time interval of 2:30 - 3:00, which would become 3:30 - 3:00.However, I'd still prefer that problem, because now I have better data to work with.
mskrajnowski commentedon Dec 3, 2014
Another thing that would help is if we had an api which would enable us to fix ambiguous and non existent times. Maybe tz_localize could return NaT along with information, why a time wasn't localized? If I had such information, I could, for example , add an hour to non existent times and retry. At the moment tz_localize raises on first error, which isn't very helpful when working with series.
ischwabacher commentedon Dec 3, 2014
I definitely think there should be an option to return
NaT
instead of raising, but it doesn't seem feasible to attach any other information to that unless you just want a warning, which we should not emit unless it's explicitly requested (since nonexistent times can arise without necessarily coming from a programmer error).As far as defaults go, I think we should keep
raise
as the default (or possiblyraise
for single operations andNaT
for vectorized operations if we can do that?) because "errors should not pass silently // unless explicitly silenced".esvhd commentedon Jun 23, 2017
Hi guys - Was this released in 0.20.2 yet? I'm seeing the same problem here.
thanks.
jorisvandenbossche commentedon Jun 26, 2017
This issue is still open, so no, this has not been fixed in 0.20.2. Contributions are welcome.
randomgambit commentedon Jul 17, 2017
@jorisvandenbossche @esvhd same problem here. Got some timestamps in central time, and try to localize in EST give me this error. Is there a quick and dirty fix for that? Thanks!
rockg commentedon Jul 17, 2017
There should be nothing wrong converting to EST as that does not have DST offsets. Can you post your example?
randomgambit commentedon Jul 17, 2017
@rockg No I mean I convert from Central time to EST.
df.timestamp.map(lambda x: x.tz_localize('US/Central', ambiguous = 'NaT').tz_convert('US/Eastern').tz_localize(None))
randomgambit commentedon Jul 17, 2017
@jorisvandenbossche @mskrajnowski @rockg would this be the
pytz
- equivalent (and correct) way to convert fromUS/Central
toUS/Eastern
in a Pandas dataframe?Assume
df.timestamp
is a naivepd.to_datetime()
timestamprandomgambit commentedon Jul 18, 2017
@jorisvandenbossche @mskrajnowski @rockg any ideas? sorry for the spam but this is an important question in my humble opinion :D