正则表达式匹配错误

Question

正则表达式匹配错误

我是Python的新手(我也没有任何编程培训),所以在我提问时请记住这一点.

我正在尝试搜索检索到的网页,并使用指定的模式查找所有链接.我已经在其他脚本中成功完成了这项工作,但我收到的错误是

raise error, v # invalid expression
Run Code Online (Sandbox Code Playgroud)
sre_constants.error:多次重复

我不得不承认我不知道为什么,但同样,我是Python和正则表达式的新手.但是,即使我不使用模式并使用特定链接(只是为了测试匹配),我也不相信我会返回任何匹配(当我打印match.group(0)时,没有任何内容发送到窗口.链接我测试的是下面评论的.

有任何想法吗？通过示例学习通常更容易,但是您可以给予的任何建议都非常感谢!

獾

import urllib2
from BeautifulSoup import BeautifulSoup
import re

url = "http://forums.epicgames.com/archive/index.php?f-356-p-164.html"
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

pattern = r'<a href="http://forums.epicgames.com/archive/index.php?t-([0-9]+).html">(.?+)</a> <i>((.?+) replies)'
#pattern = r'href="http://forums.epicgames.com/archive/index.php?t-622233.html">Gears of War 2: Horde Gameplay</a> <i>(20 replies)'

for match in re.finditer(pattern, page, re.S):
    print match(0)

Run Code Online (Sandbox Code Playgroud)

Answer 1

hug*_*own 0

import urllib2
import re
from BeautifulSoup import BeautifulSoup

url = "http://forums.epicgames.com/archive/index.php?f-356-p-164.html"
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

# Get all the links
links = [str(match) for match in soup('a')]

s = r'<a href="http://forums.epicgames.com/archive/index.php\?t-\d+.html">(.+?)</a>' 
r = re.compile(s)
for link in links:
    m = r.match(link)
    if m:
        print m.groups(1)[0]

Run Code Online (Sandbox Code Playgroud)

归档时间：	16 年，6 月前
查看次数：	12787 次
最近记录：	16 年，6 月前