根据某些col1值查找col2值，如果不存在，则使用熊猫保持最接近的值

Question

根据某些col1值查找col2值，如果不存在，则使用熊猫保持最接近的值

我有一个像这样的数据框：

df
col1      col2      
 1         10
 2         15
 4         12
 5         23
 6         11
 8         32
 9         12
 11        32
 2         23
 3         21
 4         12
 6         15
 9         12
 10        32

Run Code Online (Sandbox Code Playgroud)

我想为col1的每个1、5和10值选择col2值。如果col1值不是1、5或10，请保留col2值，其中col1值最接近1,5或10

例如，最终的df如下所示：

df
col1      col2      
 1         10
 5         23
 11        32
 2         23
 6         15
 10        32

Run Code Online (Sandbox Code Playgroud)

如何使用熊猫而不使用任何循环

Answer 1

piR*_*red 1

df.col1.diff().lt(0).cumsum()定义升序值组
set_index与这些组， col1但保持col1在正确的数据框中drop=False
groupby并与pd.concat使用reindexmethod='nearest'

我保留了旧col1索引，以便您可以看到什么映射到什么。

c = df.set_index([df.col1.diff().lt(0).cumsum().rename('grp'), 'col1'], drop=False)
pd.concat([c.xs(k).reindex([1, 5, 10], method='nearest') for k, c in c.groupby(level=0)])

      col1  col2
col1            
1        1    10
5        5    23
10      11    32
1        2    23
5        6    15
10      10    32

Run Code Online (Sandbox Code Playgroud)

如果您不喜欢col1索引中的额外内容，可以重命名索引然后删除它：

c = df.set_index([df.col1.diff().lt(0).cumsum().rename('grp'), 'col1'], drop=False)
pd.concat([c.xs(k).reindex([1, 5, 10], method='nearest') for k, c in c.groupby(level=0)]) \
    .rename_axis(None).reset_index(drop=True)

   col1  col2
0     1    10
1     5    23
2    11    32
3     2    23
4     6    15
5    10    32

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，8 月前
查看次数：	42 次
最近记录：	6 年，8 月前