Jef*_*zyk 3 python timezone pandas
我使用pandas-0.8rc2读取输入CSV,其中包含两列缺少UTC偏移信息的本地化日期时间字符串,并且需要将数据帧系列正确转换为UTC.
我一直在尝试解决这个事实,即时间戳列既不代表索引,也代表数据.tz_localize和tz_convert显然只适用于系列/数据框的索引,而不是列.我非常想学习更好的方法,而不是以下代码:
# test.py
import pandas
# input.csv:
# starting,ending,measure
# 2012-06-21 00:00,2012-06-23 07:00,77
# 2012-06-23 07:00,2012-06-23 16:30,65
# 2012-06-23 16:30,2012-06-25 08:00,77
# 2012-06-25 08:00,2012-06-26 12:00,0
# 2012-06-26 12:00,2012-06-27 08:00,77
df = pandas.read_csv('input.csv', parse_dates=[0,1])
print df
ser_starting = df.starting
ser_starting.index = ser_starting.values
ser_starting = ser_starting.tz_localize('US/Eastern')
ser_starting = ser_starting.tz_convert('UTC')
ser_ending = df.ending
ser_ending.index = ser_ending.values
ser_ending = ser_ending.tz_localize('US/Eastern')
ser_ending = ser_ending.tz_convert('UTC')
df.starting = ser_starting.index
print df
df.ending = ser_ending.index
print df
Run Code Online (Sandbox Code Playgroud)
其次,代码遇到了一些奇怪的行为.它将第二个赋值的时间戳数据更改回数据帧,无论顺序是df.starting还是df.ending:
$ python test.py
starting ending measure
0 2012-06-21 00:00:00 2012-06-23 07:00:00 77
1 2012-06-23 07:00:00 2012-06-23 16:30:00 65
2 2012-06-23 16:30:00 2012-06-25 08:00:00 77
3 2012-06-25 08:00:00 2012-06-26 12:00:00 0
4 2012-06-26 12:00:00 2012-06-27 08:00:00 77
starting ending measure
0 2012-06-21 04:00:00 2012-06-23 07:00:00 77
1 2012-06-23 11:00:00 2012-06-23 16:30:00 65
2 2012-06-23 20:30:00 2012-06-25 08:00:00 77
3 2012-06-25 12:00:00 2012-06-26 12:00:00 0
4 2012-06-26 16:00:00 2012-06-27 08:00:00 77
Traceback (most recent call last):
File "test.py", line 28, in <module>
print df
File "/path/to/lib/python2.7/site-packages/pandas/core/frame.py", line 572, in __repr__
if self._need_info_repr_():
File "/path/to/lib/python2.7/site-packages/pandas/core/frame.py", line 560, in _need_info_repr_
self.to_string(buf=buf)
File "/path/to/lib/python2.7/site-packages/pandas/core/frame.py", line 1207, in to_string
formatter.to_string(force_unicode=force_unicode)
File "/path/to/lib/python2.7/site-packages/pandas/core/format.py", line 200, in to_string
fmt_values = self._format_col(i)
File "/path/to/lib/python2.7/site-packages/pandas/core/format.py", line 242, in _format_col
space=self.col_space)
File "/path/to/lib/python2.7/site-packages/pandas/core/format.py", line 462, in format_array
return fmt_obj.get_result()
File "/path/to/lib/python2.7/site-packages/pandas/core/format.py", line 589, in get_result
fmt_values = [formatter(x) for x in self.values]
File "/path/to/lib/python2.7/site-packages/pandas/core/format.py", line 597, in _format_datetime64
base = stamp.strftime('%Y-%m-%d %H:%M:%S')
ValueError: year=1768 is before 1900; the datetime strftime() methods require year >= 1900
Run Code Online (Sandbox Code Playgroud)
打印陈述只是为了证明问题.如果我避免使用repr和其他调用strftime的方法,则不正确的值将毫无例外地执行.
奇怪的是,如果我继续在repl上调用df.{starting,ending}赋值,我通常会得到一个正确的数据帧,时间戳为:
In [151]: df
Out[151]:
starting ending measure
0 2012-06-21 04:00:00 2012-06-23 11:00:00 77
1 2012-06-23 11:00:00 2012-06-23 20:30:00 65
2 2012-06-23 20:30:00 2012-06-25 12:00:00 77
3 2012-06-25 12:00:00 2012-06-26 16:00:00 0
4 2012-06-26 16:00:00 2012-06-27 12:00:00 77
Run Code Online (Sandbox Code Playgroud)
这是不可重复的,AFAICT,我无法描述超过上述ValueError的确切调用顺序,但它确实
如果我遇到错误,或者这是不支持的API使用,我将不胜感激.
如上所述,我宁愿学习更好地使用pandas API来避免这样做.
它似乎潜伏在这里,所以我在这里创建了一个问题,很快就会看到并让你知道:
https://github.com/pydata/pandas/issues/1518
编辑:您遇到的错误已得到修复.我现在也要解决1900年前的显示问题.
| 归档时间: |
|
| 查看次数: |
2104 次 |
| 最近记录: |