python正则表达式匹配任何有效的英语句子

Question

python正则表达式匹配任何有效的英语句子

Swa*_*ale 0 python regex text-manipulation

我想知道是否可以写一个python正则表达式来匹配任何有效的英语句子,它可以有字母数字字符和特殊字符.
基本上,我想从XML文件中提取一些特定元素.这些特定元素将具有以下形式:

<p o=<Any Number>> <Any English sentence> </p>

Run Code Online (Sandbox Code Playgroud)

例如:

<p o ="1"> The quick brown fox jumps over the lazy dog </p>

Run Code Online (Sandbox Code Playgroud)

要么

<p o ="2">  And This is a number 12.90! </p>

Run Code Online (Sandbox Code Playgroud)

我们可以轻松编写正则表达式

<p o=<Any Number>>

Run Code Online (Sandbox Code Playgroud)

和</p>标签.但我有兴趣通过编写正则表达式组来提取这些标签之间的句子.

任何人都可以建议使用正则表达式来解决上述问题吗？

此外,如果您可以建议一种解决方法,那么它对我也很有帮助.

Answer 1

Kie*_*ong 8

使用像lxml这样的XML解析器,regex不适合这个任务.例:

import lxml.etree
// First we parse the xml
doc = lxml.etree.fromstring('<p o ="2">  And This is a number 12.90! </p>')
// Then we use xpath to extract the element we need
doc.xpath('/p/text()')

Run Code Online (Sandbox Code Playgroud)

您可以在以下位置阅读有关XPATH的更多信息:Xpath教程.

归档时间：	14 年前
查看次数：	462 次
最近记录：	13 年，4 月前