是否可以将数据框限制为特定行,然后更改其中一列中的某些值?
假设我计算GROWTH为(SIZE_t+1 - SIZE_t)/SIZE_t,现在我可以看到GROWTH(例如 1000)有一些奇怪的值,原因是相应SIZE变量的损坏值。现在我想找到并替换SIZE.
如果我输入:
data <- mutate(filter(data, lead(GROWTH)==1000), SIZE = 2600)
Run Code Online (Sandbox Code Playgroud)
然后只存储损坏的行,data而我的数据帧的其余部分丢失。
我想做的是将左侧的“数据”过滤到损坏值的相应行,然后改变不正确的变量(在右侧):
filter(data, lead(GROWTH)==1000) <- mutate(filter(data, lead(GROWTH)==1000), SIZE = 2600)
Run Code Online (Sandbox Code Playgroud)
但这似乎不起作用。有没有办法使用 dplyr 处理这个问题?提前谢谢了
我正在尝试"递归地"计算pandas数据帧的列值.
假设存在两个不同日期的数据,每个数据有10个观察值,并且您想要计算一些变量r,其中只给出r的第一个值(每天),并且您想要计算剩余的2*9个条目,而每个后续值取决于在前一个r和一个额外的'同时'变量'x'上.
第一个问题是我想单独执行每一天的计算,即我想pandas.groupby()在我的所有计算中使用该函数...但是当我尝试将数据子集化并使用该shift(1)函数时,我只得到"NaN"项
data.groupby(data.index)['r'] = ( (1+data.groupby(data.index)['x']*0.25) * (1+data.groupby(data.index)['r'].shift(1)))
Run Code Online (Sandbox Code Playgroud)
对于我的第二种方法,我使用for循环来遍历索引(日期):
for i in range(2,21):
data[data['rank'] == i]['r'] = ( (1+data[data['rank'] == i]['x']*0.25) * (1+data[data['rank'] == i]['r'].shift(1))
Run Code Online (Sandbox Code Playgroud)
但是,这对我不起作用.有没有办法在DataFrames上执行这样的计算?也许像滚动申请?
数据:
df = pd.DataFrame({
'rank' : [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10],
'x' : [0.00275,0.00285,0.0031,0.0036,0.0043,0.0052,0.0063,0.00755,0.00895,0.0105,0.0027,0.00285,0.0031,0.00355,0.00425,0.0051,0.00615,0.00735,0.00875,0.0103],
'r' : [0.00158,'NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN',0.001485,'NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN','NaN']
},index=['2014-01-02', '2014-01-02', '2014-01-02', '2014-01-02',
'2014-01-02', '2014-01-02', '2014-01-02', '2014-01-02',
'2014-01-02', '2014-01-02', '2014-01-03', '2014-01-03',
'2014-01-03', '2014-01-03', '2014-01-03', '2014-01-03',
'2014-01-03', '2014-01-03', '2014-01-03', '2014-01-03'])
Run Code Online (Sandbox Code Playgroud)