B3t*_*th4 1 python regex quotes parsing nlp
我正在尝试解析一个文本文件,以便在python中对它进行一些统计.为此,我想用标记替换一些标点符号.这种令牌的一个例子是终止句子(.!?成为<EndS>)的所有标点符号.我设法使用正则表达式做到这一点.现在我正在尝试解析引号.因此,我认为,我需要一种方法来区分开盘报价和收盘价.我正在逐行读取输入文件,我无法保证报价将是平衡的.
例如:
 "Death to the traitors!" cried the exasperated burghers.
 "Go along with you," growled the officer, "you always cry the same thing over again. It is very tiresome."
应该变得像:
 [Open] Death to the traitors! [Close] cried the exasperated burghers.
 [Open] Go along with you, [Close] growled the officer, [Open] you always cry the same thing over again. It is very tiresome. [Close]
是否可以使用正则表达式执行此操作?有没有更容易/更好的方法来做到这一点?
小智 5
你可以使用sub方法(模块重新):
import re
def replace_dbquote(render):
    return '[OPEN]' + render.group(0).replace('"', '') + '[CLOSE]'
string = '"Death to the traitors!" cried the exasperated burghers. "Go along with you", growled the officer.'
parser = re.sub('"[^"]*"', replace_dbquote, string)
print(parser)
https://docs.python.org/3.5/library/re.html#re.sub