Mas*_*ifu 8 python confidence-interval pandas
我正在尝试计算大型数据集中“力”列的均值和置信区间(95%)。我需要通过对不同的“类”进行分组来使用 groupby 函数的结果。
当我计算平均值并将其放入新数据框中时,它为我提供了所有行的 NaN 值。我不确定我是否走正确的路。有没有更简单的方法来做到这一点?
这是示例数据框:
df=pd.DataFrame({ 'Class': ['A1','A1','A1','A2','A3','A3'],
'Force': [50,150,100,120,140,160] },
columns=['Class', 'Force'])
Run Code Online (Sandbox Code Playgroud)
为了计算置信区间,我做的第一步是计算平均值。这是我使用的:
F1_Mean = df.groupby(['Class'])['Force'].mean()
Run Code Online (Sandbox Code Playgroud)
这给了我NaN所有行的值。
yoo*_*ghm 22
import pandas as pd
import numpy as np
import math
df=pd.DataFrame({'Class': ['A1','A1','A1','A2','A3','A3'],
'Force': [50,150,100,120,140,160] },
columns=['Class', 'Force'])
print(df)
print('-'*30)
stats = df.groupby(['Class'])['Force'].agg(['mean', 'count', 'std'])
print(stats)
print('-'*30)
ci95_hi = []
ci95_lo = []
for i in stats.index:
m, c, s = stats.loc[i]
ci95_hi.append(m + 1.95*s/math.sqrt(c))
ci95_lo.append(m - 1.95*s/math.sqrt(c))
stats['ci95_hi'] = ci95_hi
stats['ci95_lo'] = ci95_lo
print(stats)
Run Code Online (Sandbox Code Playgroud)
输出是
Class Force
0 A1 50
1 A1 150
2 A1 100
3 A2 120
4 A3 140
5 A3 160
------------------------------
mean count std
Class
A1 100 3 50.000000
A2 120 1 NaN
A3 150 2 14.142136
------------------------------
mean count std ci95_hi ci95_lo
Class
A1 100 3 50.000000 156.291651 43.708349
A2 120 1 NaN NaN NaN
A3 150 2 14.142136 169.500000 130.500000
Run Code Online (Sandbox Code Playgroud)
小智 7
您可以通过利用“sem”(平均值的标准误差)来简化@yoonghm 解决方案。
import pandas as pd
import numpy as np
import math
df=pd.DataFrame({'Class': ['A1','A1','A1','A2','A3','A3'],
'Force': [50,150,100,120,140,160] },
columns=['Class', 'Force'])
print(df)
print('-'*30)
stats = df.groupby(['Class'])['Force'].agg(['mean', 'sem'])
print(stats)
print('-'*30)
stats['ci95_hi'] = stats['mean'] + 1.96* stats['sem']
stats['ci95_lo'] = stats['mean'] - 1.96* stats['sem']
print(stats)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
21026 次 |
| 最近记录: |