如何进行多个查询?

auk*_*123 7 python dataframe pandas

我想做多个查询。这是我的数据框:

data = {'Name':['Penny','Ben','Benny','Mark','Ben1','Ben2','Ben3'], 
        'Eng':[5,1,4,3,1,2,3], 
        'Math':[1,5,3,2,2,2,3],
        'Physics':[2,5,3,1,1,2,3],
        'Sports':[4,5,2,3,1,2,3],
        'Total':[12,16,12,9,5,8,12],
        'Group':['A','A','A','A','A','B','B']}

df1=pd.DataFrame(data, columns=['Name','Eng','Math','Physics','Sports','Total','Group']) 
df1
Run Code Online (Sandbox Code Playgroud)

我有 3 个查询:

  1. A组或B组
  2. 数学 > Eng
  3. 名称以“B”开头

我试着一一做

df1[df1.Name.str.startswith('B')]
df1.query('Math > Eng')
df1[df1.Group == 'A'] #I cannot run the code with df1[df1.Group == 'A' or 'B']
Run Code Online (Sandbox Code Playgroud)

然后,我尝试合并这些查询

df1.query("'Math > Eng' & 'df1[df1.Name.str.startswith('B')]' & 'df1[df1.Group == 'A']")
TokenError: ('EOF in multi-line statement', (2, 0))
Run Code Online (Sandbox Code Playgroud)

我也试图通过str.startswith()进入df.query()

df1.query("df1.Name.str.startswith('B')")
UndefinedVariableError: name 'df1' is not defined
Run Code Online (Sandbox Code Playgroud)

我尝试了很多方法,但没有一个有效。我怎样才能把这些查询放在一起?

Yaa*_*ler 6

解决这个问题的漫长方法——也是最透明的方法,最适合初学者——是为每个过滤器创建一个布尔列。然后将这些列相加为一个最终过滤器:

df1['filter_1'] = df1['Group'].isin(['A','B'])
df1['filter_2'] = df1['Math'] > df1['Eng']
df1['filter_3'] = df1['Name'].str.startswith('B')

# If all are true
df1['filter_final'] = df1[['filter_1', 'filter_2', 'filter_3']].all(axis=1)
Run Code Online (Sandbox Code Playgroud)

您当然可以将这些步骤合二为一:

mask = ((df1['Group'].isin(['A','B'])) &
        (df1['Math'] > df1['Eng']) &
        (df1['Name'].str.startswith('B'))
       )

df['filter_final'] = mask
Run Code Online (Sandbox Code Playgroud)

最后,选择满足过滤器的行如下:

df_filtered = df1[df1['filter_final']]
Run Code Online (Sandbox Code Playgroud)

这从df1哪里选择行final_filter == True