python多行正则表达式

Question

python多行正则表达式

如何提取所有字符(包括换行符),直到第一次出现单词的给予者序列？例如,输入如下:

输入文本:

"shantaram is an amazing novel.
It is one of the best novels i have read.
the novel is written by gregory david roberts.
He is an australian"

Run Code Online (Sandbox Code Playgroud)

the 我希望从shantaram第一次出现的文本中提取文本的序列the在第二行.

输出必须是 -

shantaram is an amazing novel.
It is one of the

Run Code Online (Sandbox Code Playgroud)

我整个上午一直在努力.我可以编写表达式来提取所有字符,直到它遇到一个特定的字符,但是如果我使用的表达式如下:

re.search("shantaram[\s\S]*the", string)

Run Code Online (Sandbox Code Playgroud)

它与换行符不匹配.

Answer 1

Chr*_*our 23

您希望使用该DOTALL选项匹配换行符.来自doc.python.org:

re.DOTALL

制作'.' 特殊字符匹配任何字符,包括换行符; 没有这个标志,'.' 将匹配除换行符之外的任何内容.

演示:

In [1]: import re

In [2]: s="""shantaram is an amazing novel.
It is one of the best novels i have read.
the novel is written by gregory david roberts.
He is an australian"""

In [3]: print re.findall('^.*?the',s,re.DOTALL)[0]
shantaram is an amazing novel.
It is one of the

Run Code Online (Sandbox Code Playgroud)

Answer 2

lan*_*cif 5

使用这个正则表达式,

re.search("shantaram[\s\S]*?the", string)

Run Code Online (Sandbox Code Playgroud)

代替

re.search("shantaram[\s\S]*the", string)

Run Code Online (Sandbox Code Playgroud)

唯一的区别是'？'.通过使用'？'(例如*？,+？),您可以防止最长匹配.

归档时间：	12 年，5 月前
查看次数：	19422 次
最近记录：	12 年，5 月前