iqb*_*ved 5 pivot python-2.7 pandas
在python 2.7.3上使用pandas 0.11我试图使用以下值来转动一个简单的数据帧:
StudentID QuestionID Answer DateRecorded
0 1234 bar a 2012/01/21
1 1234 foo c 2012/01/22
2 4321 bop a 2012/01/22
3 5678 bar a 2012/01/24
4 8765 baz b 2012/02/13
5 4321 baz b 2012/02/15
6 8765 bop b 2012/02/16
7 5678 bop c 2012/03/15
8 5678 foo a 2012/04/01
9 1234 baz b 2012/04/11
10 8765 bar a 2012/05/03
11 4321 bar a 2012/05/04
12 5678 baz c 2012/06/01
13 1234 bar b 2012/11/01
Run Code Online (Sandbox Code Playgroud)
我使用以下命令:
df.pivot(index='StudentID', columns='QuestionID')
Run Code Online (Sandbox Code Playgroud)
但是我收到以下错误:
ReshapeError: Index contains duplicate entries, cannot reshape
Run Code Online (Sandbox Code Playgroud)
请注意,没有最后一行的相同数据帧
13 1234 bar b 2012/11/01
Run Code Online (Sandbox Code Playgroud)
枢轴结果成功如下:
Answer DateRecorded
QuestionID bar baz bop foo bar baz bop foo
StudentID
1234 a b NaN c 2012/01/21 2012/04/11 NaN 2012/01/22
4321 a b a NaN 2012/05/04 2012/02/15 2012/01/22 NaN
5678 a c c a 2012/01/24 2012/06/01 2012/03/15 2012/04/01
8765 a b b NaN 2012/05/03 2012/02/13 2012/02/16 NaN
Run Code Online (Sandbox Code Playgroud)
我是新的转动,想知道为什么有重复的StudentID,QuestionID对造成这个问题?而且,我如何使用df.pivot()函数解决这个问题?
谢谢.
您期望您的数据透视表与重复条目一样?我不确定在数据透视表中为(1234,bar)设置多个元素是否有意义.您的数据看起来很自然地被(questionID,studentID,dateRecorded)索引.
如果你采用分层索引方法(它们真的不那么复杂!)我会尝试:
In [104]: df2 = df.set_index(['StudentID', 'QuestionID', 'DateRecorded'])
In [105]: df2
Out[105]:
Answer
StudentID QuestionID DateRecorded
1234 bar 2012/01/21 a
foo 2012/01/22 c
4321 bop 2012/01/22 a
5678 bar 2012/01/24 a
8765 baz 2012/02/13 b
4321 baz 2012/02/15 b
8765 bop 2012/02/16 b
5678 bop 2012/03/15 c
foo 2012/04/01 a
1234 baz 2012/04/11 b
8765 bar 2012/05/03 a
4321 bar 2012/05/04 a
5678 baz 2012/06/01 c
1234 bar 2012/11/01 b
In [106]: df2.unstack('QuestionID')
Out[106]:
Answer
QuestionID bar baz bop foo
StudentID DateRecorded
1234 2012/01/21 a NaN NaN NaN
2012/01/22 NaN NaN NaN c
2012/04/11 NaN b NaN NaN
2012/11/01 b NaN NaN NaN
4321 2012/01/22 NaN NaN a NaN
2012/02/15 NaN b NaN NaN
2012/05/04 a NaN NaN NaN
5678 2012/01/24 a NaN NaN NaN
2012/03/15 NaN NaN c NaN
2012/04/01 NaN NaN NaN a
2012/06/01 NaN c NaN NaN
8765 2012/02/13 NaN b NaN NaN
2012/02/16 NaN NaN b NaN
2012/05/03 a NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
否则,您可以提出一些规则来确定为数据透视表采取的多个条目中的哪一个,并避免使用Hierarchical索引.
归档时间: |
|
查看次数: |
2456 次 |
最近记录: |