到目前为止,EdChum提供了以下代码:
In [1]:
df = pd.DataFrame({'a': [None] * 6, 'b': [2, 3, 10, 3, 5, 8]})
df["c"] =np.NaN
df["c"][0] = 1
df["c"][2] = 3
def func(x):
if pd.notnull(x['c']):
return x['c']
else:
return df.iloc[x.name - 1]['c'] * x['b']
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df
Out[1]:
a b c
0 None 2 1
1 None 3 3
2 None 10 3
3 None 3 9
4 None 5 45
5 None 8 360
Run Code Online (Sandbox Code Playgroud)
这也很有效,但是一旦我更改dateframe = df的索引,如下所示:
rng = pd.date_range('1/1/2011', periods=6, freq='D')
df = pd.DataFrame({'a': [None] * 6, 'b': [2, 3, 10, 3, 5, 8]},index=rng)
Run Code Online (Sandbox Code Playgroud)
我收到以下错误: TypeError: ("cannot do label indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2011-01-01 00:00:00] of <class 'pandas.tslib.Timestamp'>", u'occurred at index 2011-01-02 00:00:00')
这里有什么问题?我如何调整代码以使其与da DatetimeIndex一起使用?
以下工作,区别在于我使用以下命令获取索引中datetime值的整数位置get_loc:
In [48]:
rng = pd.date_range('1/1/2011', periods=6, freq='D')
df = pd.DataFrame({'a': [None] * 6, 'b': [2, 3, 10, 3, 5, 8]},index=rng)
df["c"] =np.NaN
?
df["c"][0] = 1
df["c"][2] = 3
?
?
def func(x):
if pd.notnull(x['c']):
return x['c']
else:
return df.iloc[df.index.get_loc(x.name) - 1]['c'] * x['b']
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df
Out[48]:
a b c
2011-01-01 None 2 1
2011-01-02 None 3 3
2011-01-03 None 10 3
2011-01-04 None 3 9
2011-01-05 None 5 45
2011-01-06 None 8 360
Run Code Online (Sandbox Code Playgroud)