小编Son*_*nny的帖子

解析带有缺少字段的制表符分隔文件

这是我正在尝试解析的复杂制表符分隔文件的示例

ENTRY   map0010\tNAME Glycolysis\tDESCRIPTION Glycolysis is the process of converting glucose into pyruvate\tCLASS   Metabolism\tDISEASE   H00071  Hereditary fructose intolerance\tH00072  Pyruvate dehydrogenase complex deficiency\tDBLINKS     GO: 0006096 0006094
ENTRY   map00020\tNAME  Citrate cycle (TCA cycle)\tCLASS   Metabolism; Carbohydrate Metabolism\tDISEASE   H00073  Pyruvate carboxylase deficiency\tDBLINKS     GO: 0006099\tREL_PATHWAY map00010  Glycolysis / Gluconeogenesis\tmap00053  Ascorbate and aldarate metabolism
Run Code Online (Sandbox Code Playgroud)

我正在尝试获取仅包含一些字段的输出,例如:

ENTRY   map0010\tNAME Glycolysis\tCLASS   Metabolism\tDISEASE   H00071  Hereditary fructose intolerance H00072  Pyruvate dehydrogenase complex deficiency\tDBLINKS     GO: 0006096 0006094\tNA
ENTRY   map00020\tNAME  Citrate cycle (TCA cycle)\tCLASS   Metabolism; Carbohydrate Metabolism\tDISEASE   H00073  Pyruvate carboxylase deficiency\tDBLINKS     GO: 0006099\tREL_PATHWAY …
Run Code Online (Sandbox Code Playgroud)

python csv parsing

4
推荐指数
1
解决办法
1841
查看次数

使用Python从文本文件到csv

我需要帮助解析一个非常长的文本文件,如下所示:

NAME         IMP4   
DESCRIPTION  small nucleolar ribonucleoprotein 
CLASS        Genetic Information Processing
             Translation
             Ribosome biogenesis in eukaryotes
DBLINKS      NCBI-GI: 15529982
             NCBI-GeneID: 92856
             OMIM: 612981
///
NAME         COMMD9
DESCRIPTION  COMM domain containing 9
ORGANISM     H.sapiens
DBLINKS      NCBI-GI: 156416007
             NCBI-GeneID: 29099
             OMIM: 612299
///
.....
Run Code Online (Sandbox Code Playgroud)

我想获得一个结构化的csv文件,每行中的列数相同,以便轻松提取我需要的信息.

首先我试着这样做:

for line in a:
    if '///' not in line:
        b.write(''.join(line.replace('\n', '\t')))
    else:
    b.write('\n')
Run Code Online (Sandbox Code Playgroud)

获得这样的csv:

NAME         IMP4\tDESCRIPTION  small nucleolar ribonucleoprotein\tCLASS        Genetic Information Processing\t             Translation\t             Ribosome biogenesis in eukaryotes\tDBLINKS      NCBI-GI: 15529982\t            NCBI-GeneID: 92856\t
         OMIM: 612981
NAME         COMMD9\tDESCRIPTION  COMM …
Run Code Online (Sandbox Code Playgroud)

python csv parsing text

3
推荐指数
1
解决办法
2956
查看次数

标签 统计

csv ×2

parsing ×2

python ×2

text ×1