我正在尝试在Bash或Python中的模式之间对线进行排序。我想基于第二个字段以“,”作为分隔符对行进行排序。
给定以下文本输入文件:
Sample1
T1,64,0.65 MEDIUM
T2,60,0.45 LOW
T3,301,0.68 MEDIUM
T4,65,0.75 HIGH
T5,59,0.72 MEDIUM
T6,51,0.82 HIGH
Sample2
T1,153,0.77 HIGH
T2,152,0.61 MEDIUM
T3,154,0.67 MEDIUM
T4,283,0.66 MEDIUM
T5,161,0.65 MEDIUM
Sample3
T1,147,0.71 MEDIUM
T2,154,0.63 MEDIUM
T3,45,0.63 MEDIUM
T4,259,0.77 HIGH
Run Code Online (Sandbox Code Playgroud)
我期望作为输出:
Sample1
T6,51,0.82 HIGH
T5,59,0.72 MEDIUM
T2,60,0.45 LOW
T1,64,0.65 MEDIUM
T4,65,0.75 HIGH
T3,301,0.68 MEDIUM
Sample2
T2,152,0.61 MEDIUM
T1,153,0.77 HIGH
T3,154,0.67 MEDIUM
T5,161,0.65 MEDIUM
T4,283,0.66 MEDIUM
Sample3
T3,45,0.63 MEDIUM
T1,147,0.71 MEDIUM
T2,154,0.63 MEDIUM
T4,259,0.77 HIGH
Run Code Online (Sandbox Code Playgroud)
我试图通过另一篇文章中的glenn jackman来适应这个建议,但据我测试,它仅适用于2种模式:
> gawk -v cmd="sort -k2" p=1 '
> /^PATTERN2/ …Run Code Online (Sandbox Code Playgroud) 我从字典创建了以下 DataFrame:
clusters
OG1.5_1000 [6243|g1697.t1_CBS136243, 6243|g7411.t1_CBS136...
OG1.5_1001 [2003|g3159.t1_CBS132003, 2003|g4503.t1_CBS132...
OG1.5_1002 [4916|g1071.t1_CBS134916, 4916|g1248.t1_CBS134...
OG1.5_1003 [4916|g913.t1_CBS134916, 4920|g2467.t1_CBS1349...
OG1.5_1004 [2003|g2248.t1_CBS132003, 2003|g3254.t1_CBS132...
OG1.5_1005 [2003|g1615.t1_CBS132003, 2003|g1622.t1_CBS132...
Run Code Online (Sandbox Code Playgroud)
当我尝试使用“,”作为分隔符进行分割时,我得到多个“NaN”
df['clusters'].str.split(',')
OG1.5_1001 NaN
OG1.5_1002 NaN
OG1.5_1003 NaN
OG1.5_1004 NaN
OG1.5_1005 NaN
Run Code Online (Sandbox Code Playgroud)
关于我做错了什么有什么建议吗?或者我如何拆分列“簇”?