如何提取与文本文件中的正则表达式匹配的行号

Question

如何提取与文本文件中的正则表达式匹配的行号

use*_*610 2 python regex nlp part-of-speech

我正在做一个关于统计机器翻译的项目,我需要从POS标记的文本文件中提取与正则表达式匹配的行号(任何非分离的短语动词与粒子'out'),并写入行号到一个文件(在python中).

我有这个正则表达式:'\ w*_VB.？\ sout_RP'和我的POS标记文本文件:'Corpus.txt'.我想得到一个输出文件,其行号与上述正则表达式匹配,输出文件每行只有一行号(没有空行),例如:

2

五

44

到目前为止,我在脚本中的所有内容如下:

OutputLineNumbers = open('OutputLineNumbers', 'w')
with open('Corpus.txt', 'r') as textfile:
    phrase='\w*_VB.?\sout_RP'
    for phrase in textfile: 

OutputLineNumbers.close()

Run Code Online (Sandbox Code Playgroud)

知道如何解决这个问题吗？

提前谢谢你的帮助!

Answer 1

Kal*_*n02 5

这应该可以解决你的问题,假设你在变量'phrase'中有正确的正则表达式

import re

# compile regex
regex = re.compile('[0-9]+')

# open the files
with open('Corpus.txt','r') as inputFile:
    with open('OutputLineNumbers', 'w') as outputLineNumbers:
        # loop through each line in corpus
        for line_i, line in enumerate(inputFile, 1):
            # check if we have a regex match
            if regex.search( line ):
                # if so, write it the output file
                outputLineNumbers.write( "%d\n" % line_i )

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，4 月前
查看次数：	8027 次
最近记录：	9 年，4 月前