mar*_*ana 9 python grouping aggregate pandas seaborn
说我有这个数据帧
d = { 'Path' : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'],
'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'],
'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'],
'Value' : [30, 20, 10, 40, 40, 50],
'Field' : [50, 70, 10, 20, 30, 30] }
df = DataFrame(d)
df.set_index(['Path', 'Detail'], inplace=True)
df
Field Program Value
Path Detail
abc foo 50 prog1 30
bar 70 prog1 20
ghi bar 10 prog1 10
foo 20 prog2 40
jkl foo 30 prog3 40
foo 30 prog3 50
Run Code Online (Sandbox Code Playgroud)
我可以聚合它没问题(如果有更好的方法来做到这一点,顺便说一句,我想知道!)
df_count = df.groupby('Program').count().sort(['Value'], ascending=False)[['Value']]
df_count
Program Value
prog1 3
prog3 2
prog2 1
df_mean = df.groupby('Program').mean().sort(['Value'], ascending=False)[['Value']]
df_mean
Program Value
prog3 45
prog2 40
prog1 20
Run Code Online (Sandbox Code Playgroud)
我可以从熊猫那里画出来没有问题......
df_mean.plot(kind='bar')
Run Code Online (Sandbox Code Playgroud)
但是当我在seaborn中尝试时,为什么会出现这个错误?
sns.factorplot('Program',data=df_mean)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-23c2921627ec> in <module>()
----> 1 sns.factorplot('Program',data=df_mean)
C:\Anaconda3\lib\site-packages\seaborn\categorical.py in factorplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, size, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
2673 # facets to ensure representation of all data in the final plot
2674 p = _CategoricalPlotter()
-> 2675 p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
2676 order = p.group_names
2677 hue_order = p.hue_names
C:\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
143 if isinstance(input, string_types):
144 err = "Could not interperet input '{}'".format(input)
--> 145 raise ValueError(err)
146
147 # Figure out the plotting orientation
ValueError: Could not interperet input 'Program'
Run Code Online (Sandbox Code Playgroud)
lrn*_*cig 13
您获得异常的原因是它Program
成为数据帧的索引,df_mean
并df_count
在您之后group_by
操作.
如果你想获得factorplot
从df_mean
,一个简单的解决方案是将索引添加为一列,
In [7]:
df_mean['Program'] = df_mean.index
In [8]:
%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)
Run Code Online (Sandbox Code Playgroud)
但是你可以更简单地让factorplot
你为你做计算,
sns.factorplot(x='Program', y='Value', data=df)
Run Code Online (Sandbox Code Playgroud)
你会得到相同的结果.希望能帮助到你.
评论后编辑
事实上,你对这个参数提出了很好的观点as_index
; 默认情况下,它设置为True,在这种情况下Program
成为索引的一部分,如在您的问题中.
In [14]:
df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean
Out[14]:
Value
Program
prog3 45
prog2 40
prog1 20
Run Code Online (Sandbox Code Playgroud)
为了清楚起见,这种方式Program
不再是专栏了,但它成了索引.该技巧df_mean['Program'] = df_mean.index
实际上保持了索引的原样,并为索引添加了一个新列,以便这样做Program
现在重复了.
In [15]:
df_mean['Program'] = df_mean.index
df_mean
Out[15]:
Value Program
Program
prog3 45 prog3
prog2 40 prog2
prog1 20 prog1
Run Code Online (Sandbox Code Playgroud)
但是,如果设置as_index
为False,则得到Program
一个列,加上一个新的自动增量索引,
In [16]:
df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean
Out[16]:
Program Value
2 prog3 45
1 prog2 40
0 prog1 20
Run Code Online (Sandbox Code Playgroud)
这样你可以直接喂它seaborn
.不过,你可以使用df
并获得相同的结果.
希望能帮助到你.