LPG*_*LPG 4 python numpy dataframe pandas
在Pandas 0.14.1中,diff()不会在时间序列的开头生成值.
使用diff()似乎不同于cumsum(),它假设NaN == 0.我想知道是否有办法使diff()假设为先前丢失的数据为0(因为它是从开始之前丢失)时间序列).
例如:
>print df
2014-05-01 A Apple 1
B Banana 2
2014-06-01 A Apple 3
B Banana 4
Run Code Online (Sandbox Code Playgroud)
结果是:
>print df.groupby(level=[1,2]).diff()
2014-05-01 A Apple NaN
B Banana NaN
2014-06-01 A Apple 2
B Banana 2
Run Code Online (Sandbox Code Playgroud)
当所需的输出是:
2014-05-01 A Apple 1
B Banana 2
2014-06-01 A Apple 2
B Banana 2
Run Code Online (Sandbox Code Playgroud)
据我所知,groupby(...).diff()只是调用np.diff总是返回比传递给它的数组1(或n)短的数组.
但是填补缺失的数据应该很容易.像这样的东西?
In [175]: df
Out[175]:
d
a b c
2014-05-01 A Apple 1
B Banana 2
2014-06-01 A Apple 3
B Banana 4
In [176]: df['diff'] = df.groupby(level=[1,2])['d'].diff()
In [177]: df['diff'] = df['diff'].fillna(df['d'])
In [178]: df
Out[178]:
d diff
a b c
2014-05-01 A Apple 1 1
B Banana 2 2
2014-06-01 A Apple 3 2
B Banana 4 2
Run Code Online (Sandbox Code Playgroud)