Tho*_*ius 8 python dataframe pandas categorical-data
说,我已经给出了一个DataFrame,其中大多数列都是分类数据.
> data.head()
age risk sex smoking
0 28 no male no
1 58 no female no
2 27 no male yes
3 26 no male no
4 29 yes female yes
Run Code Online (Sandbox Code Playgroud)
我想通过这些分类变量的键值对字典对这些数据进行子集化.
tmp = {'risk':'no', 'smoking':'yes', 'sex':'female'}
Run Code Online (Sandbox Code Playgroud)
因此,我想拥有以下子集.
data[ (data.risk == 'no') & (data.smoking == 'yes') & (data.sex == 'female')]
Run Code Online (Sandbox Code Playgroud)
我想做的是:
data[tmp]
Run Code Online (Sandbox Code Playgroud)
这样做最蟒蛇/熊猫的方法是什么?
最小的例子:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
x = Series(random.randint(0,2,50), dtype='category')
x.cat.categories = ['no', 'yes']
y = Series(random.randint(0,2,50), dtype='category')
y.cat.categories = ['no', 'yes']
z = Series(random.randint(0,2,50), dtype='category')
z.cat.categories = ['male', 'female']
a = Series(random.randint(20,60,50), dtype='category')
data = DataFrame({'risk':x, 'smoking':y, 'sex':z, 'age':a})
tmp = {'risk':'no', 'smoking':'yes', 'sex':'female'}
Run Code Online (Sandbox Code Playgroud)
我会使用.query()方法来完成这个任务:
qry = ' and '.join(["{} == '{}'".format(k,v) for k,v in tmp.items()])
data.query(qry)
Run Code Online (Sandbox Code Playgroud)
输出:
age risk sex smoking
7 24 no female yes
22 43 no female yes
23 42 no female yes
25 24 no female yes
32 29 no female yes
40 34 no female yes
43 35 no female yes
Run Code Online (Sandbox Code Playgroud)
请求参数:
print(qry)
"sex == 'female' and risk == 'no' and smoking == 'yes'"
Run Code Online (Sandbox Code Playgroud)