使用^来匹配Python正则表达式中的行首

Question

使用^来匹配Python正则表达式中的行首

我正试图从Thomson-Reuters Web of Science中提取出版年份的ISI风格数据."Publication Year"的行看起来像这样(在一行的最开头):

PY 2015

Run Code Online (Sandbox Code Playgroud)

对于我正在编写的脚本,我已经定义了以下正则表达式函数:

import re
f = open('savedrecs.txt')
wosrecords = f.read()

def findyears():
    result = re.findall(r'PY (\d\d\d\d)', wosrecords)
    print result

findyears()

Run Code Online (Sandbox Code Playgroud)

然而,这会产生假阳性结果,因为该模式可能出现在数据的其他地方.

所以,我想只匹配一行开头的模式.通常我会^用于此目的,但r'^PY (\d\d\d\d)'未能匹配我的结果.另一方面,使用\n似乎做我想要的,但这可能会导致我的进一步复杂化.

Answer 1

sin*_*ash 23

re.findall(r'^PY (\d\d\d\d)', wosrecords, flags=re.MULTILINE)

Run Code Online (Sandbox Code Playgroud)

应该工作,如果没有,请告诉我.我没有你的数据.

Answer 2

Wik*_*żew 7

使用re.search有re.M：

import re
p = re.compile(r'^PY\s+(\d{4})', re.M)
test_str = "PY123\nPY 2015\nPY 2017"
print(re.findall(p, test_str))

Run Code Online (Sandbox Code Playgroud)

见IDEONE演示

说明：

^-一行的开始（由于re.M）
PY -文字 PY
\s+ -1个或多个空格
(\d{4}) -持有4位数字的捕获组

归档时间：	10 年，3 月前
查看次数：	25926 次
最近记录：	8 年，1 月前