gur*_*kan 1 python scipy pandas
I have a large dataframe similar to this one:
In [1]: grades
Out[1]:
course1 course2
school class student
school1 class1 student1 2 2
student2 3 2
student3 1 3
student4 3 1
student5 3 1
... ... ...
class3 student86 3 1
student87 2 2
student88 1 1
student89 3 3
student90 0 1
[90 rows x 2 columns]
Run Code Online (Sandbox Code Playgroud)
I want to compute the Mann-Whitney rank test on the grades from the sample school and each sub-sample class. How can I do this using pandas and scipy.stats.mannwhitneyu without iterating through the dataframe?
您想要做的是groupby
在索引级别上并应用一个调用函数mannwhitneyu
,将两列course1
和course2
. 假设这是您的数据:
index = pandas.MultiIndex.from_product([
['school{0}'.format(n) for n in xrange(3)],
['class{0}'.format(n) for n in xrange(3)],
['student{0}'.format(n) for n in xrange(10)]
])
d = pandas.DataFrame({'course1': np.random.randint(0, 10, 90), 'course2': np.random.randint(0, 10, 90)},
index=index)
Run Code Online (Sandbox Code Playgroud)
然后按学校计算 Mann-Whitney U:
>>> d.groupby(level=0).apply(lambda t: stats.mannwhitneyu(t.course1, t.course2))
school0 (426.5, 0.365937834646)
school1 (445.0, 0.473277409673)
school2 (421.0, 0.335714211748)
dtype: object
Run Code Online (Sandbox Code Playgroud)
并按班级做到这一点:
>>> d.groupby(level=[0, 1]).apply(lambda t: stats.mannwhitneyu(t.course1, t.course2))
school0 class0 (38.5, 0.200247279189)
class1 (37.0, 0.169040187814)
class2 (46.5, 0.409559639829)
school1 class0 (33.5, 0.110329749527)
class1 (47.5, 0.439276896563)
class2 (30.0, 0.0684355963119)
school2 class0 (47.5, 0.439438219083)
class1 (43.0, 0.308851989782)
class2 (34.0, 0.118791221444)
dtype: object
Run Code Online (Sandbox Code Playgroud)
levels
参数中的数字groupby
指的是您的 MultiIndex 的级别。因此,按学校/班级组合按 0 级组分组,按 0 级和 1 级组分组。
归档时间: |
|
查看次数: |
4449 次 |
最近记录: |