在绘制groupbys时,"无法解释输入"错误与Seaborn

mar*_*ana 9 python grouping aggregate pandas seaborn

说我有这个数据帧

d = {     'Path'   : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'],
          'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'],
          'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'],
          'Value'  : [30, 20, 10, 40, 40, 50],
          'Field'  : [50, 70, 10, 20, 30, 30] }


df = DataFrame(d)
df.set_index(['Path', 'Detail'], inplace=True)
df

               Field Program  Value
Path Detail                      
abc  foo        50   prog1     30
     bar        70   prog1     20
ghi  bar        10   prog1     10
     foo        20   prog2     40
jkl  foo        30   prog3     40
     foo        30   prog3     50
Run Code Online (Sandbox Code Playgroud)

我可以聚合它没问题(如果有更好的方法来做到这一点,顺便说一句,我想知道!)

df_count = df.groupby('Program').count().sort(['Value'], ascending=False)[['Value']]
df_count

Program   Value
prog1    3
prog3    2
prog2    1

df_mean = df.groupby('Program').mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Program  Value
prog3    45
prog2    40
prog1    20
Run Code Online (Sandbox Code Playgroud)

我可以从熊猫那里画出来没有问题......

df_mean.plot(kind='bar')
Run Code Online (Sandbox Code Playgroud)

但是当我在seaborn中尝试时,为什么会出现这个错误?

sns.factorplot('Program',data=df_mean)
    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-23c2921627ec> in <module>()
----> 1 sns.factorplot('Program',data=df_mean)

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in factorplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, size, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
   2673     # facets to ensure representation of all data in the final plot
   2674     p = _CategoricalPlotter()
-> 2675     p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
   2676     order = p.group_names
   2677     hue_order = p.hue_names

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
    143                 if isinstance(input, string_types):
    144                     err = "Could not interperet input '{}'".format(input)
--> 145                     raise ValueError(err)
    146 
    147             # Figure out the plotting orientation

ValueError: Could not interperet input 'Program'
Run Code Online (Sandbox Code Playgroud)

lrn*_*cig 13

您获得异常的原因是它Program成为数据帧的索引,df_meandf_count在您之后group_by操作.

如果你想获得factorplotdf_mean,一个简单的解决方案是将索引添加为一列,

In [7]:

df_mean['Program'] = df_mean.index

In [8]:

%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)
Run Code Online (Sandbox Code Playgroud)

但是你可以更简单地让factorplot你为你做计算,

sns.factorplot(x='Program', y='Value', data=df)
Run Code Online (Sandbox Code Playgroud)

你会得到相同的结果.希望能帮助到你.

评论后编辑

事实上,你对这个参数提出了很好的观点as_index; 默认情况下,它设置为True,在这种情况下Program成为索引的一部分,如在您的问题中.

In [14]:

df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Out[14]:
        Value
Program 
prog3   45
prog2   40
prog1   20
Run Code Online (Sandbox Code Playgroud)

为了清楚起见,这种方式Program不再是专栏了,但它成了索引.该技巧df_mean['Program'] = df_mean.index实际上保持了索引的原样,并为索引添加了一个新列,以便这样做Program现在重复了.

In [15]:

df_mean['Program'] = df_mean.index
df_mean

Out[15]:
        Value   Program
Program     
prog3   45  prog3
prog2   40  prog2
prog1   20  prog1
Run Code Online (Sandbox Code Playgroud)

但是,如果设置as_index为False,则得到Program一个列,加上一个新的自动增量索引,

In [16]:

df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean

Out[16]:
    Program Value
2   prog3   45
1   prog2   40
0   prog1   20
Run Code Online (Sandbox Code Playgroud)

这样你可以直接喂它seaborn.不过,你可以使用df并获得相同的结果.

希望能帮助到你.