根据其他列的条件评估 pandas 列的值

Shr*_*rmn 4 python dataframe pandas

我有一个数据框:

\n
df_test = pd.DataFrame({'col': ['paris', 'paris', 'nantes', 'berlin', 'berlin', 'berlin', 'tokyo'],\n                        'id_res': [12, 12, 14, 28, 8, 4, 89]})\n\n\n     col  id_res\n0   paris      12\n1   paris      12\n2  nantes      14\n3  berlin      28\n4  berlin       8\n5  berlin       4\n6   tokyo      89\n
Run Code Online (Sandbox Code Playgroud)\n

我想创建一个“检查”列,其值 \xe2\x80\x8b\xe2\x80\x8bare 如下:

\n
    \n
  • 如果“col”中的值有重复项并且这些重复项具有相同的 id_res,则“check”对于重复项的值为 False
  • \n
  • 如果“col”中的值有重复项,并且这些重复项的“id_res”不同,则在“check”中为最大“id_res”值分配 True,为最小“id_res”值分配 False
  • \n
  • 如果“col”中的值没有重复项,则“check”的值为 False。
  • \n
\n

因此我想要的输出是:

\n
    col  id_res  check\n0   paris      12  False\n1   paris      12  False\n2  nantes      14  False\n3  berlin      28   True\n4  berlin       8  False\n5  berlin       4  False\n6   tokyo      89  False\n
Run Code Online (Sandbox Code Playgroud)\n

我尝试使用 groupby 但没有令人满意的结果。\n任何人都可以帮助我吗?

\n

Cor*_*ien 7

id_res创建 2 个布尔掩码,然后将它们组合起来并找到每个掩码的最高值col

m1 = df['col'].duplicated(keep=False)
m2 = ~df['id_res'].duplicated(keep=False)
df['check'] = df.index.isin(df[m1 & m2].groupby('col')['id_res'].idxmax())
print(df)

# Output
      col  id_res  check
0   paris      12  False
1   paris      12  False
2  nantes      14  False
3  berlin      28   True
4  berlin       8  False
5  berlin       4  False
6   tokyo      89  False
Run Code Online (Sandbox Code Playgroud)

细节:

>>> pd.concat([df, m1.rename('m1'), m2.rename('m2')])
      col  id_res  check     m1     m2
0   paris      12  False   True  False
1   paris      12  False   True  False
2  nantes      14  False  False   True
3  berlin      28   True   True   True  # <-  group to check
4  berlin       8  False   True   True  # <-     because 
5  berlin       4  False   True   True  # <- m1 and m2 are True
6   tokyo      89  False  False   True
Run Code Online (Sandbox Code Playgroud)


moz*_*way 5

您基本上有 3 个条件,因此请使用掩码并取逻辑交集 (AND/ &):

g = df_test.groupby('col')['id_res']

# is col duplicated?
m1 = df_test['col'].duplicated(keep=False)
# [ True  True False  True  True  True False]

# is id_res max of its group?
m2 = df_test['id_res'].eq(g.transform('max'))
# [ True  True  True  True False False  True]

# is group diverse? (more than 1 id_res)
m3 = g.transform('nunique').gt(1)
# [False False False  True  True  True False]

# check if all conditions True
df_test['check'] = m1&m2&m3
Run Code Online (Sandbox Code Playgroud)

输出:

      col  id_res  check
0   paris      12  False
1   paris      12  False
2  nantes      14  False
3  berlin      28   True
4  berlin       8  False
5  berlin       4  False
6   tokyo      89  False
Run Code Online (Sandbox Code Playgroud)