相关疑难解决方法(0)

Python正则表达式后视需要固定宽度模式

在尝试提取html页面的标题时,我总是使用以下正则表达式:

(?<=<title.*>)([\s\S]*)(?=</title>)
Run Code Online (Sandbox Code Playgroud)

这将提取文档中标记之间的所有内容,并忽略标记本身.但是,当尝试在Python中使用此正则表达式时,会引发以下异常:

Traceback (most recent call last):  
File "test.py", line 21, in <module>
    pattern = re.compile('(?<=<title.*>)([\s\S]*)(?=</title>)')
File "C:\Python31\lib\re.py", line 205, in compile
    return _compile(pattern, flags)   
File "C:\Python31\lib\re.py", line 273, in _compile
    p = sre_compile.compile(pattern, flags)   File
"C:\Python31\lib\sre_compile.py", line 495, in compile
    code = _code(p, flags)   File "C:\Python31\lib\sre_compile.py", line 480, in _code
_compile(code, p.data, flags)   File "C:\Python31\lib\sre_compile.py", line 115, in _compile
    raise error("look-behind requires fixed-width pattern")
sre_constants.error: look-behind requires fixed-width pattern
Run Code Online (Sandbox Code Playgroud)

我使用的代码是:

pattern = re.compile('(?<=<title.*>)([\s\S]*)(?=</title>)')
m = pattern.search(f)
Run Code Online (Sandbox Code Playgroud)

如果我做一些最小的调整它的工作原理:

pattern …
Run Code Online (Sandbox Code Playgroud)

html python regex

8
推荐指数
3
解决办法
4931
查看次数

标签 统计

html ×1

python ×1

regex ×1