use*_*097 6 python pivot-table pandas
我想使用数据透视表来汇总数据集,然后能够访问数据透视表中的信息,就像它是一个DataFrame一样.
考虑一个分层数据集,患者在医院和位于地区的医院接受治疗:
import pandas as pd
example_data = {'patient' : ['p1','p2','p3','p4','p5','p6','p7','p8','p9','p10','p11','p12','p13','p14','p15','p16','p17','p18','p19','p20','p21','p22','p23','p24','p25','p26','p27','p28','p29','p30','p31','p32','p33','p34','p35','p36','p37','p38','p39','p40','p41','p42','p43','p44','p45','p46','p47','p48','p49','p50','p51','p52','p53','p54','p55','p56','p57','p58','p59','p60','p61','p62','p63'],
'hospital' : ['h1','h1','h1','h2','h2','h2','h2','h3','h3','h3','h3','h3','h4','h4','h4','h4','h4','h4','h5','h5','h5','h5','h5','h5','h5','h6','h6','h6','h6','h6','h6','h6','h6','h7','h7','h7','h7','h7','h7','h7','h7','h7','h8','h8','h8','h8','h8','h8','h8','h8','h8','h8','h9','h9','h9','h9','h9','h9','h9','h9','h9','h9','h9'],
'region' : ['r1','r1','r1','r1','r1','r1','r1','r1','r1','r1','r1','r1','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r2','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3','r3'] }
example_dataframe = pd.DataFrame(example_data)
print example_dataframe
Run Code Online (Sandbox Code Playgroud)
这产生如下的简单输出:
hospital patient region
0 h1 p1 r1
1 h1 p2 r1
2 h1 p3 r1
3 h2 p4 r1
4 h2 p5 r1
5 h2 p6 r1
6 h2 p7 r1
7 h3 p8 r1
8 h3 p9 r1
9 h3 p10 r1
10 h3 p11 r1
11 h3 p12 r1
12 h4 p13 r2
13 h4 p14 r2
14 h4 p15 r2
15 h4 p16 r2
16 h4 p17 r2
etc.
Run Code Online (Sandbox Code Playgroud)
现在我想总结使用数据透视表,只计算每家医院的患者数量:
example_pivot_table = pd.pivot_table(example_dataframe, values='patient', rows=['hospital','region'], aggfunc='count')
print example_pivot_table
Run Code Online (Sandbox Code Playgroud)
这会产生以下输出:
hospital region
h1 r1 3
h2 r1 4
h3 r1 5
h4 r2 6
h5 r2 7
h6 r2 8
h7 r3 9
h8 r3 10
h9 r3 11
Name: patient, dtype: int64
Run Code Online (Sandbox Code Playgroud)
据我了解,这实际上是一个多索引系列.
我如何使用这些数据来找出医院h7所在的区域?如果hospital,region并且患者计数数据是DataFrame中的单独列,则很容易.但我认为医院和地区是指数.我已经尝试过很多东西,但却无法让它发挥作用.
您可以使用get_level_values获取医院列。您可以传递级别编号或级别名称,即0或hospital
然后你可以通过以下方式得到你想要的:
In [38]: example_pivot_table[ example_pivot_table.index.get_level_values('hospital') == 'h7' ]
Out[38]:
hospital region
h7 r3 9
Name: patient, dtype: int64
Run Code Online (Sandbox Code Playgroud)
要获取区域,您可以这样做
example_pivot_table[ example_pivot_table.index.get_level_values('hospital') == 'h7' ]['regions']
Run Code Online (Sandbox Code Playgroud)