我有一个这样的专栏
valueCount
0.0
nan
2.0
1.0
1.0
1.0
nan
nan
nan
4.0
Run Code Online (Sandbox Code Playgroud)
我想根据可用的下一个值(加)或上一个值(减)来填充。所以结果应该是
valueCount
0.0
**1.0**
2.0
1.0
1.0
1.0
**1.0**
**2.0**
**3.0**
4.0
Run Code Online (Sandbox Code Playgroud)
我知道这是非常有条件的,如果我以前的值是 0,我可以将 +1 添加到 nan 行,否则我应该从 0,1,2 开始添加,依此类推。
我可以在简单的 python 列表中执行这个算法,但是在 Pandas 中有什么简单的方法吗?
您可以使用:
a = df['valueCount'].isnull()
b = a.cumsum()
c = df['valueCount'].bfill()
d = c + (b-b.mask(a).bfill().fillna(0).astype(int)).sub(1)
df['valueCount'] = df['valueCount'].fillna(d)
print (df)
valueCount
0 0.0
1 1.0
2 2.0
3 1.0
4 1.0
5 1.0
6 1.0
7 2.0
8 3.0
9 4.0
Run Code Online (Sandbox Code Playgroud)
细节+解释:
#back filling NaN values
x = df['valueCount'].bfill()
#compare by NaNs
a = df['valueCount'].isnull()
#cumulative sum of mask
b = a.cumsum()
#replace Trues to NaNs
c = b.mask(a)
#forward fill NaNs
d = b.mask(a).bfill()
#First NaNs to 0 and cast to integers
e = b.mask(a).bfill().fillna(0).astype(int)
#add to backfilled Series cumulative sum and subtract from cumulative sum Series, 1
f = x + b - e - 1
#replace NaNs by Series f
g = df['valueCount'].fillna(f)
df = pd.concat([df['valueCount'], x, a, b, c, d, e, f, g], axis=1,
keys=('orig','x','a','b','c','d','e', 'f', 'g'))
print (df)
orig x a b c d e f g
0 0.0 0.0 False 0 0.0 0.0 0 -1.0 0.0
1 NaN 2.0 True 1 NaN 1.0 1 1.0 1.0
2 2.0 2.0 False 1 1.0 1.0 1 1.0 2.0
3 1.0 1.0 False 1 1.0 1.0 1 0.0 1.0
4 1.0 1.0 False 1 1.0 1.0 1 0.0 1.0
5 1.0 1.0 False 1 1.0 1.0 1 0.0 1.0
6 NaN 4.0 True 2 NaN 4.0 4 1.0 1.0
7 NaN 4.0 True 3 NaN 4.0 4 2.0 2.0
8 NaN 4.0 True 4 NaN 4.0 4 3.0 3.0
9 4.0 4.0 False 4 4.0 4.0 4 3.0 4.0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1728 次 |
| 最近记录: |