the*_*ist 6 python numpy scipy pandas
我想切片数据帧以返回元素x = 0连续出现至少 n = 3次的行,然后在每个迷你序列中丢弃第一个i = 2个实例
有没有一种有效的方法来实现大熊猫,如果没有,使用numpy或scipy?
import pandas as pd
import numpy as np
Run Code Online (Sandbox Code Playgroud)
例1
df=pd.DataFrame({'A':[0,1,0,0,1,1,0,0,0,0,1,1,0,0,0,1,1],'B':np.random.randn(17)})
A B
0 0 0.748958
1 1 0.254730
2 0 0.629609
3 0 0.272738
4 1 -1.885906
5 1 1.206371
6 0 -0.332471
7 0 0.217553
8 0 0.768986
9 0 -1.607236
10 1 1.613650
11 1 -1.096892
12 0 -0.435762
13 0 0.131284
14 0 -0.177188
15 1 1.393890
16 1 0.174803
Run Code Online (Sandbox Code Playgroud)
期望的输出:
A B
8 0 0.768986
9 0 -1.607236
14 0 -0.177188
Run Code Online (Sandbox Code Playgroud)
例2
x = 0(感兴趣的元素)
n = 5(序列的最小长度)
i = 2(每个序列中前两个)
df2=pd.DataFrame({'A':[0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0],'B':np.random.randn(20)})
A B
0 0 0.703803
1 0 -0.144088
2 0 0.635577
3 0 -0.834611
4 0 1.472271
5 0 -0.554860
6 0 -0.167016
7 1 0.578847
8 1 -1.873663
9 0 0.197062
10 0 1.458845
11 0 -1.921660
12 0 -1.301481
13 0 0.240197
14 0 -1.425058
15 1 -2.801151
16 0 0.766757
17 0 1.249806
18 0 0.595366
19 0 -1.447632
Run Code Online (Sandbox Code Playgroud)
期望的输出:
A B
2 0 0.635577
3 0 -0.834611
4 0 1.472271
5 0 -0.554860
6 0 -0.167016
11 0 -1.921660
12 0 -1.301481
13 0 0.240197
14 0 -1.425058
Run Code Online (Sandbox Code Playgroud)
这是使用一些 NumPy 操作的方法 -
def slice_consc(df,n):
Acol = np.array(df['A'])
Acol_ext = np.concatenate(([0],(Acol==0)+0,[0]))
starts = np.where(np.diff(Acol_ext)==1)[0]
stops = np.where(np.diff(Acol_ext)==-1)[0]
id = np.zeros(Acol.size+2,dtype=int)
valid_mask = stops - starts >= n
id[stops[valid_mask]] = -1
id[starts[valid_mask]+2] = 1
return df[(id.cumsum()==1)[:-2]]
Run Code Online (Sandbox Code Playgroud)
样本运行 -
情况1:
>>> df
A B
0 0 0.977325
1 1 -0.408457
2 0 -0.377973
3 0 0.567537
4 1 -0.222019
5 1 -1.167422
6 0 -0.142546
7 0 0.675458
8 0 -0.184456
9 0 -0.826050
10 1 -0.772413
11 1 -1.556440
12 0 -0.687249
13 0 -0.481676
14 0 0.420400
15 1 0.031999
16 1 -1.092540
>>> slice_consc(df,3)
A B
8 0 -0.184456
9 0 -0.826050
14 0 0.420400
Run Code Online (Sandbox Code Playgroud)
案例#2:
>>> df2
A B
0 0 0.757102
1 0 2.114935
2 0 -0.352309
3 0 -0.214931
4 0 -1.626064
5 0 -0.989776
6 0 0.639635
7 1 0.049358
8 1 -2.600326
9 0 0.057792
10 0 1.263418
11 0 0.618495
12 0 -1.637054
13 0 1.220862
14 0 1.245484
15 1 1.388218
16 0 -0.499900
17 0 0.761310
18 0 -1.308176
19 0 -2.005983
>>> slice_consc(df2,5)
A B
2 0 -0.352309
3 0 -0.214931
4 0 -1.626064
5 0 -0.989776
6 0 0.639635
11 0 0.618495
12 0 -1.637054
13 0 1.220862
14 0 1.245484
Run Code Online (Sandbox Code Playgroud)