eha*_*nom 2 python combinations pandas
我有一个数据帧:
df = pd.DataFrame({
'A': [1,2,3,4],
'B': [12,23,34,45]
})
Run Code Online (Sandbox Code Playgroud)
看起来像
----------------------------
index A B
0 1 12
1 2 23
2 3 34
3 4 45
-----------------------------
Run Code Online (Sandbox Code Playgroud)
我有一阵次,[0,1,2].我想df每次复制行A和B:
------------------------------------
index A B time
1 12 0
1 12 1
1 12 2
2 23 0
2 23 1
2 23 2
3 34 0
3 34 1
3 34 2
4 45 0
4 45 1
4 45 2
-------------------------------------
Run Code Online (Sandbox Code Playgroud)
我不想使用MultiIndex或Stack(因为我希望它尽可能平坦).结合没有帮助.我没有加入,因为我正在尝试组合,所以Merge/Concatenate似乎无济于事.
也许使用pd.concat比稍快reindex
pd.concat([df]*len([0,1,2])).sort_index().assign(time=[0,1,2]*len(df))
Out[275]:
A B time
0 1 12 0
0 1 12 1
0 1 12 2
1 2 23 0
1 2 23 1
1 2 23 2
2 3 34 0
2 3 34 1
2 3 34 2
3 4 45 0
3 4 45 1
3 4 45 2
Run Code Online (Sandbox Code Playgroud)
IIUC,使用reindex+repeat
o = df.shape[0]
df = df.reindex(df.index.repeat(len(times))).reset_index(drop=True)
df['time'] = times*o
A B time
0 1 12 0
1 1 12 1
2 1 12 2
3 2 23 0
4 2 23 1
5 2 23 2
6 3 34 0
7 3 34 1
8 3 34 2
9 4 45 0
10 4 45 1
11 4 45 2
Run Code Online (Sandbox Code Playgroud)
绩效考核:
%timeit df.reindex(df.index.repeat(len(times))).reset_index(drop=True).assign(time=times*df.shape[0])
675 µs ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit pd.concat([df]*len([0,1,2])).sort_index().assign(time=[0,1,2]*len(df))
812 µs ± 6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Run Code Online (Sandbox Code Playgroud)
对于小型dfs,和
%timeit df.reindex(df.index.repeat(len(times))).reset_index(drop=True).assign(time=times*df.shape[0])
237 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit pd.concat([df]*len([0,1,2])).sort_index().assign(time=[0,1,2]*len(df))
5.78 ms ± 27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Run Code Online (Sandbox Code Playgroud)
对于大型dfs