oro*_*ome 3 python indexing subset pandas
如何获得 Pandas 数据帧子集中的最大值?
例如,当我做类似的事情时
statedata[statedata['state.region'] == 'Northeast'].ix[statedata['Murder'].idxmax()]
Run Code Online (Sandbox Code Playgroud)
我收到一个 KeyError ,表明正在idxmax返回全局最大值阿拉巴马州的密钥,而不是查询子集中的最大值(其中该密钥当然丢失了)。
有没有办法在 Pandas 上简洁地做到这一点?
作为参考,这里使用的数据来自 R,使用
data(state)
statedata = cbind(data.frame(state.x77), state.abb, state.area, state.center, state.division, state.name, state.region)
Run Code Online (Sandbox Code Playgroud)
然后从 R 导出并由 Pandas 导入。
您可以使用df.loc来选择子 DataFrame:
import pandas as pd
import pandas.rpy.common as com
import rpy2.robjects as ro
r = ro.r
statedata = r('''cbind(data.frame(state.x77), state.abb, state.area, state.center,
state.division, state.name, state.region)''')
df = com.convert_robj(statedata)
df.columns = df.columns.to_series().str.replace('state.', '')
subdf = df.loc[df['region']=='Northeast', 'Murder']
print(subdf)
# Connecticut 3.1
# Maine 2.7
# Massachusetts 3.3
# New Hampshire 3.3
# New Jersey 5.2
# New York 10.9
# Pennsylvania 6.1
# Rhode Island 2.4
# Vermont 5.5
# Name: Murder, dtype: float64
print(subdf.idxmax())
Run Code Online (Sandbox Code Playgroud)
印刷
New York
Run Code Online (Sandbox Code Playgroud)
要为每个地区选择谋杀率最高的州(截至 1976 年):
In [24]: df.groupby('region')['Murder'].idxmax()
Out[24]:
region
North Central Michigan
Northeast New York
South Alabama
West Nevada
Name: Murder, dtype: object
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4117 次 |
| 最近记录: |