小编Lfm*_*fm_的帖子

在模式之间的文本文件中对行进行排序

我正在尝试在Bash或Python中的模式之间对线进行排序。我想基于第二个字段以“,”作为分隔符对行进行排序。

给定以下文本输入文件:

Sample1
T1,64,0.65  MEDIUM
T2,60,0.45  LOW
T3,301,0.68  MEDIUM
T4,65,0.75  HIGH
T5,59,0.72  MEDIUM
T6,51,0.82  HIGH
Sample2
T1,153,0.77  HIGH
T2,152,0.61  MEDIUM
T3,154,0.67  MEDIUM
T4,283,0.66  MEDIUM
T5,161,0.65  MEDIUM
Sample3
T1,147,0.71  MEDIUM
T2,154,0.63  MEDIUM
T3,45,0.63  MEDIUM
T4,259,0.77  HIGH
Run Code Online (Sandbox Code Playgroud)

我期望作为输出:

Sample1
T6,51,0.82  HIGH
T5,59,0.72  MEDIUM
T2,60,0.45  LOW
T1,64,0.65  MEDIUM
T4,65,0.75  HIGH
T3,301,0.68  MEDIUM
Sample2
T2,152,0.61  MEDIUM
T1,153,0.77  HIGH
T3,154,0.67  MEDIUM
T5,161,0.65  MEDIUM
T4,283,0.66  MEDIUM
Sample3
T3,45,0.63  MEDIUM
T1,147,0.71  MEDIUM
T2,154,0.63  MEDIUM
T4,259,0.77  HIGH
Run Code Online (Sandbox Code Playgroud)

我试图通过另一篇文章中的glenn jackman来适应这个建议,但据我测试,它仅适用于2种模式:

> gawk -v cmd="sort -k2" p=1 '
>     /^PATTERN2/ …
Run Code Online (Sandbox Code Playgroud)

python awk sed

5
推荐指数
1
解决办法
62
查看次数

熊猫 str.split(' ') 返回 NaN

我从字典创建了以下 DataFrame:

                                                     clusters
OG1.5_1000  [6243|g1697.t1_CBS136243, 6243|g7411.t1_CBS136...
OG1.5_1001  [2003|g3159.t1_CBS132003, 2003|g4503.t1_CBS132...
OG1.5_1002  [4916|g1071.t1_CBS134916, 4916|g1248.t1_CBS134...
OG1.5_1003  [4916|g913.t1_CBS134916, 4920|g2467.t1_CBS1349...
OG1.5_1004  [2003|g2248.t1_CBS132003, 2003|g3254.t1_CBS132...
OG1.5_1005  [2003|g1615.t1_CBS132003, 2003|g1622.t1_CBS132...
Run Code Online (Sandbox Code Playgroud)

当我尝试使用“,”作为分隔符进行分割时,我得到多个“NaN”

df['clusters'].str.split(',')

OG1.5_1001    NaN
OG1.5_1002    NaN
OG1.5_1003    NaN
OG1.5_1004    NaN
OG1.5_1005    NaN
Run Code Online (Sandbox Code Playgroud)

关于我做错了什么有什么建议吗?或者我如何拆分列“簇”?

python pandas

4
推荐指数
1
解决办法
7501
查看次数

标签 统计

python ×2

awk ×1

pandas ×1

sed ×1