Python Pandas 匹配来自另一个 Dataframe 的最近索引

Question

Python Pandas 匹配来自另一个 Dataframe 的最近索引

df.index = 10,100,1000

df2.index = 1,2,11,50,101,500,1001
Just sample

Run Code Online (Sandbox Code Playgroud)

我需要根据这些条件将 df2 中最接近的索引与 df 进行比较

df2.index 必须 > df.index
只有一个最接近的值

例如输出

df     |   df2
10     |   11
100    |   101
1000   |   1001

Run Code Online (Sandbox Code Playgroud)

现在我可以用 for 循环来做，而且速度非常慢

我用 new_df2 来保持索引而不是 df2

new_df2 = pd.DataFrame(columns = ["value"])
for col in df.index:
    for col2 in df2.index:
        if(col2 > col):
            new_df2.loc[col2] = df2.loc[col2]
            break
        else:
            df2 = df2[1:] #delete first row for index speed

Run Code Online (Sandbox Code Playgroud)

在这种情况下如何避免for循环谢谢。

Answer 1

Mar*_*ius 5

不确定这有多健壮，但您可以对其进行排序df2，使其索引减少，并用于asof查找与的索引中的每个键匹配的最新索引标签df：

df2.sort_index(ascending=False, inplace=True)
df['closest_df2'] = df.index.map(lambda x: df2.index.asof(x))

df
Out[19]: 
      a  closest_df2
10    1           11
100   2          101
1000  3         1001

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，5 月前
查看次数：	324 次
最近记录：	6 年，8 月前