尝试转动pandas数据帧时出现ReshapeError

iqb*_*ved 5 pivot python-2.7 pandas

在python 2.7.3上使用pandas 0.11我试图使用以下值来转动一个简单的数据帧:

    StudentID QuestionID Answer DateRecorded
0        1234        bar      a   2012/01/21
1        1234        foo      c   2012/01/22
2        4321        bop      a   2012/01/22
3        5678        bar      a   2012/01/24
4        8765        baz      b   2012/02/13
5        4321        baz      b   2012/02/15
6        8765        bop      b   2012/02/16
7        5678        bop      c   2012/03/15
8        5678        foo      a   2012/04/01
9        1234        baz      b   2012/04/11
10       8765        bar      a   2012/05/03
11       4321        bar      a   2012/05/04
12       5678        baz      c   2012/06/01
13       1234        bar      b   2012/11/01
Run Code Online (Sandbox Code Playgroud)

我使用以下命令:

 df.pivot(index='StudentID', columns='QuestionID')
Run Code Online (Sandbox Code Playgroud)

但是我收到以下错误:

ReshapeError: Index contains duplicate entries, cannot reshape
Run Code Online (Sandbox Code Playgroud)

请注意,没有最后一行的相同数据帧

13       1234        bar      b   2012/11/01
Run Code Online (Sandbox Code Playgroud)

枢轴结果成功如下:

           Answer               DateRecorded                                    
QuestionID    bar baz  bop  foo          bar         baz         bop         foo
StudentID                                                                       
1234            a   b  NaN    c   2012/01/21  2012/04/11         NaN  2012/01/22
4321            a   b    a  NaN   2012/05/04  2012/02/15  2012/01/22         NaN
5678            a   c    c    a   2012/01/24  2012/06/01  2012/03/15  2012/04/01
8765            a   b    b  NaN   2012/05/03  2012/02/13  2012/02/16         NaN
Run Code Online (Sandbox Code Playgroud)

我是新的转动,想知道为什么有重复的StudentID,QuestionID对造成这个问题?而且,我如何使用df.pivot()函数解决这个问题?

谢谢.

Tom*_*ger 5

您期望您的数据透视表与重复条目一样?我不确定在数据透视表中为(1234,bar)设置多个元素是否有意义.您的数据看起来很自然地被(questionID,studentID,dateRecorded)索引.

如果你采用分层索引方法(它们真的不那么复杂!)我会尝试:

In [104]: df2 = df.set_index(['StudentID', 'QuestionID', 'DateRecorded'])

In [105]: df2
Out[105]: 
                                  Answer
StudentID QuestionID DateRecorded       
1234      bar        2012/01/21        a
          foo        2012/01/22        c
4321      bop        2012/01/22        a
5678      bar        2012/01/24        a
8765      baz        2012/02/13        b
4321      baz        2012/02/15        b
8765      bop        2012/02/16        b
5678      bop        2012/03/15        c
          foo        2012/04/01        a
1234      baz        2012/04/11        b
8765      bar        2012/05/03        a
4321      bar        2012/05/04        a
5678      baz        2012/06/01        c
1234      bar        2012/11/01        b

In [106]: df2.unstack('QuestionID')
Out[106]: 
                       Answer               
QuestionID                bar  baz  bop  foo
StudentID DateRecorded                      
1234      2012/01/21        a  NaN  NaN  NaN
          2012/01/22      NaN  NaN  NaN    c
          2012/04/11      NaN    b  NaN  NaN
          2012/11/01        b  NaN  NaN  NaN
4321      2012/01/22      NaN  NaN    a  NaN
          2012/02/15      NaN    b  NaN  NaN
          2012/05/04        a  NaN  NaN  NaN
5678      2012/01/24        a  NaN  NaN  NaN
          2012/03/15      NaN  NaN    c  NaN
          2012/04/01      NaN  NaN  NaN    a
          2012/06/01      NaN    c  NaN  NaN
8765      2012/02/13      NaN    b  NaN  NaN
          2012/02/16      NaN  NaN    b  NaN
          2012/05/03        a  NaN  NaN  NaN
Run Code Online (Sandbox Code Playgroud)

否则,您可以提出一些规则来确定为数据透视表采取的多个条目中的哪一个,并避免使用Hierarchical索引.