我有两个要合并/分组的数据框。它们如下:
df_1
words start stop
0 Oh, 6.72 7.21
1 okay, 7.26 8.01
2 go 12.82 12.90
3 ahead. 12.91 12.94
4 NaN 15.29 15.62
5 NaN 15.63 15.99
6 NaN 16.09 16.36
7 NaN 16.37 16.96
8 NaN 17.88 18.36
9 NaN 18.37 19.36Run Code Online (Sandbox Code Playgroud)
df_2
data start stop
10 1.0 3.5
14 4.0 8.5
11 9.0 13.5
12 14.0 20.5Run Code Online (Sandbox Code Playgroud)
我想将 df_1.words 合并到 df_2,但将 df_1.words 中的所有值分组,其中 df_1.start 位于 df_2.start 和 df_2.stop 之间。它应该是这样的:
df_2
data start stop words
10 1.0 3.5 NaN
14 4.0 8.5 Oh, okay,
11 9.0 13.5 go ahead.
12 14.0 20.5 NaN, NaN, NaN, NaN, NaN, NaNRun Code Online (Sandbox Code Playgroud)
如果 bin 边缘不像示例中那样重叠,请使用pd.cut, 和 来IntervalIndex对第一个 DataFrame 进行分组。这允许您在两个边缘上都关闭。然后从“停止”列中进行选择df_2以获取聚合结果。
import pandas as pd
idx = pd.Index([pd.Interval(*x, closed='both') for x in zip(df_2.start, df_2.stop)])
s = df_1.groupby(pd.cut(df_1.start, idx)).words.agg(list)
# Closed on both, can use `'stop'` to align
df_2['words'] = s[df_2.stop].to_list()
Run Code Online (Sandbox Code Playgroud)
print(df_2)
data start stop words
0 10 1.0 3.5 []
1 14 4.0 8.5 [Oh,, okay,]
2 11 9.0 13.5 [go, ahead.]
3 12 14.0 20.5 [nan, nan, nan, nan, nan, nan]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
97 次 |
| 最近记录: |