小编Gar*_*oap的帖子

快速平均化熊猫数据帧子集

我正在尝试遍历大量试验并计算多个子集的加权平均值。当前，数据为长格式，带有列试验，区域得分。

  trial  area       score
0  T106     0     0.0035435
1  T106     1     0.0015967
2  T106     4     0.0003191
3  T106     4     0.1272919
4  T288     0     0.1272883

Run Code Online (Sandbox Code Playgroud)

我大约有120,000个试验，有4个领域，每个试验可能有10到100个分数，总共约700万行。我的第一个想法是在4个区域内循环遍历所有试验，构建一个临时数据框以计算分数，然后将分数添加到外部数据框：

for area in range(4):
    for trial in trial_names.iloc[:,0]:  
        Tscore = 0
        temp_trial = pd.DataFrame(trials_long.loc[(trials_long['tname'] == trial) & (trials_long['area'] == int(area))])
        #match score in tria
        temp_trial = temp_trial.merge(scores_df, how='left')
        #sum score for all matching 'trial' +'area'                      #this will be weigted avrg, with >0.5 *2 and >0.9* 3
        temp_trial.loc[temp_trial['score'] > 0.9, ['score']] *= 3        #weight 3x for …

Run Code Online (Sandbox Code Playgroud)

python pandas

Gar*_*oap

lucky-day

2
推荐指数

1
解决办法

53
查看次数

标签统计

pandas ×1

python ×1

快速平均化熊猫数据帧子集

标签 统计

小编Gar_oap的帖子

标签统计