我有一个Pandas DataFrame如下:
a b c d
0 Apple 3 5 7
1 Banana 4 4 8
2 Cherry 7 1 3
3 Apple 3 4 7
Run Code Online (Sandbox Code Playgroud)
我想按行'a'对行进行分组,同时将列'c'中的值替换为分组行中的值的平均值,并添加另一列,其中列'c'中的值的std偏差已经计算出其平均值.对于要分组的所有行,"b"或"d"列中的值是常量.所以,期望的输出将是:
a b c d e
0 Apple 3 4.5 7 0.707107
1 Banana 4 4 8 0
2 Cherry 7 1 3 0
Run Code Online (Sandbox Code Playgroud)
实现这一目标的最佳方法是什么?
我有一个dicts列表如下:
[{'ppm_error': -5.441115144810845e-07, 'key': 'Y7', 'obs_ion': 1054.5045550349998},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1047.547178035},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1381.24928035},
{'ppm_error': -2.5532659838679713e-06, 'key': 'Y4', 'obs_ion': 741.339467035},
{'ppm_error': 1.3036219678359603e-05, 'key': 'Y10', 'obs_ion': 1349.712302035},
{'ppm_error': 3.4259216556970878e-06, 'key': 'Y6', 'obs_ion': 941.424286035},
{'ppm_error': 1.1292770047090912e-06, 'key': 'Y2', 'obs_ion': 261.156025035},
{'ppm_error': 1.1292770047090912e-06, 'key': 'Y2', 'obs_ion': 389.156424565},
{'ppm_error': 9.326980606898406e-06, 'key': 'Y5', 'obs_ion': 667.3107950350001}
]
Run Code Online (Sandbox Code Playgroud)
我想删除带有重复键的dicts,这样只保留带有唯一"key"的dicts.在最终列表中哪个dict结束并不重要.因此最终列表应如下所示:
[{'ppm_error': -5.441115144810845e-07, 'key': 'Y7', 'obs_ion': 1054.5045550349998},
{'ppm_error': 2.3119997582222951e-07, 'key': 'Y9', 'obs_ion': 1381.24928035},
{'ppm_error': -2.5532659838679713e-06, 'key': 'Y4', 'obs_ion': 741.339467035},
{'ppm_error': 1.3036219678359603e-05, 'key': 'Y10', 'obs_ion': …Run Code Online (Sandbox Code Playgroud) 我有一个数据框,如下所示(前3行):
Sample_Name Sample_ID Sample_Type IS Component_Name IS_Name Component_Group_Name Outlier_Reasons Actual_Concentration Area Height Retention_Time Width_at_50_pct Used Calculated_Concentration Accuracy
Index
1 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown True GluCer(d18:1/12:0)_LCB_264.3 NaN NaN NaN 0.1 2.733532e+06 5.963840e+05 2.963911 0.068676 True NaN NaN
2 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown True GluCer(d18:1/17:0)_LCB_264.3 NaN NaN NaN 0.1 2.945190e+06 5.597470e+05 2.745026 0.068086 True NaN NaN
3 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown False GluCer(d18:1/16:0)_LCB_264.3 GluCer(d18:1/17:0)_LCB_264.3 NaN NaN NaN 3.993535e+06 8.912731e+05 2.791991 0.059864 True 125.927659773487 NaN
Run Code Online (Sandbox Code Playgroud)
尝试生成数据透视表时:
pivoted_report_conc = raw_report.pivot(index = "Sample_Name", columns = 'Component_Name', values = …Run Code Online (Sandbox Code Playgroud) 我有这样的文本文件:
771 776 #1 556.766700(2)
538 #2 1069.652700(2)
531 #3 1074.407600(2)
81 84 89 94 111 #4 1501.062900(2)
85 91 #5 782.298900(3)
32 42 66 71 90 95 101 #6 904.016500(3)
Run Code Online (Sandbox Code Playgroud)
我想分割并将子串保存到不同的变量,如下所示:例如在第1行:
scans= 771 776, uid = 1 mz = 556.766700, z = 2
Run Code Online (Sandbox Code Playgroud)
我正在尝试使用以下代码,但我需要有关正则表达式的帮助:
f = open(filename, 'r')
par_info=[]
for rows in f:
re.sub('\#(.+)\s(.+)\((.+)\+', scans=\g<1>, uid=\g<2>, mz = int(\g<3>), z=int(\g<4>), rest)
info={'sc_num':scans, 'ident':uid, 'mass':mz, 'charge':z}
par_info.append(info)
Run Code Online (Sandbox Code Playgroud) 我有一个字符串如下:
mod_str ="10Deamidated; 12Gln->pyro-Glu"
Run Code Online (Sandbox Code Playgroud)
我想将字符串的两个部分分成列表的元素,其中元组包含整数和字符串,如下所示:
[('10', 'Deamidated'), ('12', 'Gln->pyro-Glu')]
Run Code Online (Sandbox Code Playgroud)
我使用以下代码捕获字符串,但我不知道如何包含特殊字符.
match_pattern = re.compile(r'(\d+)(\w+)')
items = match_pattern.findall(mod_str)
Run Code Online (Sandbox Code Playgroud)
到目前为止,输出看起来像这样:
[('10', 'Deamidated'), ('12', 'Gln')]
Run Code Online (Sandbox Code Playgroud)
对于如何解决这个问题,有任何的建议吗?