有一个问题在pandas中将索引从整数更改为日期

Mic*_*Cox 2 python pandas qstk

我有一个问题是将pandas DataFrame索引从整数更改为日期时间.我想这样做,以便我可以调用reindex并填写表中列出的日期之间的日期.请注意,我现在必须使用pandas 0.7.3因为我也使用qstk,而qstk依赖于pandas 0.7.3

首先,这是我的布局:

(Pdb) df
    AAPL  GOOG   IBM   XOM                 date
1      0     0  4000     0  2011-01-13 16:00:00
2      0  1000  4000     0  2011-01-26 16:00:00
3      0  1000  4000     0  2011-02-02 16:00:00
4      0  1000  4000  4000  2011-02-10 16:00:00
6      0     0  1800  4000  2011-03-03 16:00:00
7      0     0  3300  4000  2011-06-03 16:00:00
8      0     0     0  4000  2011-05-03 16:00:00
9   1200     0     0  4000  2011-06-10 16:00:00
11  1200     0     0  4000  2011-08-01 16:00:00
12     0     0     0  4000  2011-12-20 16:00:00

(Pdb) type(df['date'])
<class 'pandas.core.series.Series'>

(Pdb) df2 = DataFrame(index=df['date'])
(Pdb) df2
Empty DataFrame
Columns: array([], dtype=object)
Index: array([2011-01-13 16:00:00, 2011-01-26 16:00:00, 2011-02-02 16:00:00,
       2011-02-10 16:00:00, 2011-03-03 16:00:00, 2011-06-03 16:00:00,
       2011-05-03 16:00:00, 2011-06-10 16:00:00, 2011-08-01 16:00:00,
       2011-12-20 16:00:00], dtype=object)

(Pdb) df2.merge(df,left_index=True,right_on='date')
    AAPL  GOOG   IBM   XOM                 date
1      0     0  4000     0  2011-01-13 16:00:00
2      0  1000  4000     0  2011-01-26 16:00:00
3      0  1000  4000     0  2011-02-02 16:00:00
4      0  1000  4000  4000  2011-02-10 16:00:00
6      0     0  1800  4000  2011-03-03 16:00:00
8      0     0     0  4000  2011-05-03 16:00:00
7      0     0  3300  4000  2011-06-03 16:00:00
9   1200     0     0  4000  2011-06-10 16:00:00
11  1200     0     0  4000  2011-08-01 16:00:00
12     0     0     0  4000  2011-12-20 16:00:00
Run Code Online (Sandbox Code Playgroud)

我尝试了多种方法来获取日期时间索引:

1.)将reindex()方法与日期时间值列表一起使用.这会创建一个日期时间索引,但随后会为DataFrame中的数据填充NaN.我猜这是因为原始值与整数索引相关联并且重新索引到datetime会尝试使用默认值填充新索引(如果未指示填充方法,则为NaN).正是如此:

(Pdb) df.reindex(index=df['date'])
                     AAPL  GOOG  IBM  XOM date
date                                          
2011-01-13 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-01-26 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-02-02 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-02-10 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-03-03 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-06-03 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-05-03 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-06-10 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-08-01 16:00:00   NaN   NaN  NaN  NaN  NaN
2011-12-20 16:00:00   NaN   NaN  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

2.)将DataFrame.merge与我的原始df和第二个数据帧df2一起使用,这基本上只是一个没有别的日期时间索引.所以我最终做了类似的事情:

(pdb) df2.merge(df,left_index=True,right_on='date')
    AAPL  GOOG   IBM   XOM                 date
1      0     0  4000     0  2011-01-13 16:00:00
2      0  1000  4000     0  2011-01-26 16:00:00
3      0  1000  4000     0  2011-02-02 16:00:00
4      0  1000  4000  4000  2011-02-10 16:00:00
6      0     0  1800  4000  2011-03-03 16:00:00
8      0     0     0  4000  2011-05-03 16:00:00
7      0     0  3300  4000  2011-06-03 16:00:00
9   1200     0     0  4000  2011-06-10 16:00:00
11  1200     0     0  4000  2011-08-01 16:00:00
Run Code Online (Sandbox Code Playgroud)

(反之亦然).但我总是最终得到这种东西,整数指数.

3.)从具有日期时间索引(从df的'date'字段创建)和一堆空列开始的空DataFrame开始.然后我尝试通过将具有相同名称的列设置为等于df中的列来分配每个列:

(Pdb) df2['GOOG']=0
(Pdb) df2
                     GOOG
date                     
2011-01-13 16:00:00     0
2011-01-26 16:00:00     0
2011-02-02 16:00:00     0
2011-02-10 16:00:00     0
2011-03-03 16:00:00     0
2011-06-03 16:00:00     0
2011-05-03 16:00:00     0
2011-06-10 16:00:00     0
2011-08-01 16:00:00     0
2011-12-20 16:00:00     0
(Pdb) df2['GOOG'] = df['GOOG']
(Pdb) df2
                     GOOG
date                     
2011-01-13 16:00:00   NaN
2011-01-26 16:00:00   NaN
2011-02-02 16:00:00   NaN
2011-02-10 16:00:00   NaN
2011-03-03 16:00:00   NaN
2011-06-03 16:00:00   NaN
2011-05-03 16:00:00   NaN
2011-06-10 16:00:00   NaN
2011-08-01 16:00:00   NaN
2011-12-20 16:00:00   NaN
Run Code Online (Sandbox Code Playgroud)

那么,如何在pandas 0.7.3中使用datetime索引而不是整数索引重新创建df?我错过了什么?

And*_*den 6

我想你正在寻找set_index:

In [11]: df.set_index('date')
Out[11]: 
                     AAPL  GOOG   IBM   XOM
date                                  
2011-01-13 16:00:00     0     0  4000     0
2011-01-26 16:00:00     0  1000  4000     0
2011-02-02 16:00:00     0  1000  4000     0
2011-02-10 16:00:00     0  1000  4000  4000
2011-03-03 16:00:00     0     0  1800  4000
2011-06-03 16:00:00     0     0  3300  4000
2011-05-03 16:00:00     0     0     0  4000
2011-06-10 16:00:00  1200     0     0  4000
2011-08-01 16:00:00  1200     0     0  4000
2011-12-20 16:00:00     0     0     0  4000
Run Code Online (Sandbox Code Playgroud)