Mah*_*a M 5 python python-3.x pandas
如何根据另一个列值过滤数据框中的行?
我有一个数据框,它是
ip_df:
class name marks min_marks min_subjects
0 I tom [89,85,80,74] 80 2
1 II sam [65,72,43,40] 85 1
Run Code Online (Sandbox Code Playgroud)
根据“min_subject”和“min_marks”的列值,应过滤该行。
对于索引 0,"min_subjects" 为 "2","marks" 列中至少有 2 个元素应大于 80 即,"min_marks" 列然后必须添加名为 "flag" 的新列作为 1
对于索引 1,“min_subjects”为“1”,“marks”列中至少有 1 个元素应大于 85,即“min_marks”列,然后必须将名为“flag”的新列添加为 0(即, flag=0 因为这里不满足条件)
最后的结局应该是
op_df:
class name marks min_marks min_subjects flag
0 I tom [89,85,80,74] 80 2 1
1 II sam [65,72,43,40] 85 1 0
Run Code Online (Sandbox Code Playgroud)
任何人都可以帮助我在数据框中实现相同的目标吗?
使用zip3 列的列表理解,比较生成器中的每个值并sum进行计数,最后按最小标记进行比较并转换为整数:
df['flag'] = [1 if sum(x > c for x in a) >= b else 0
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
Run Code Online (Sandbox Code Playgroud)
替代使用 Convert boolean by intto 0,1:
df['flag'] = [int(sum(x > c for x in a) >= b)
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
Run Code Online (Sandbox Code Playgroud)
或者解决方案numpy:
df['flag'] = [int(np.sum(np.array(a) > c) >= b)
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
Run Code Online (Sandbox Code Playgroud)
print (df)
class name marks min_marks min_subjects flag
0 I tom [89, 85, 80, 74] 80 2 1
1 II sam [65, 72, 43, 40] 85 1 0
Run Code Online (Sandbox Code Playgroud)