Python正则表达式交替

Question

Python正则表达式交替

我试图在网页上找到所有链接,"http://something"或者https://something.我制作了一个正则表达式并且它有效:

L = re.findall(r"http://[^/\"]+/|https://[^/\"]+/", site_str)

Run Code Online (Sandbox Code Playgroud)

但是,有没有更短的方式来写这个？我正在重复:// [^/\"] + /两次,可能没有任何需要.我尝试了各种各样的东西,但它不起作用.我试过:

L = re.findall(r"http|https(://[^/\"]+/)", site_str)
L = re.findall(r"(http|https)://[^/\"]+/", site_str)
L = re.findall(r"(http|https)(://[^/\"]+/)", site_str)

Run Code Online (Sandbox Code Playgroud)

很明显我在这里遗漏了一些东西,或者我只是不太了解python正则表达式.

Answer 1

Mar*_*ers 10

您正在使用捕获组,并.findall()在使用时改变行为(它只返回捕获组的内容).您的正则表达式可以简化,但如果您使用非捕获组,则您的版本将起作用:

L = re.findall(r"(?:http|https)://[^/\"]+/", site_str)

Run Code Online (Sandbox Code Playgroud)

如果在表达式周围使用单引号,则不需要转义双引号,并且只需要改变s表达式,因此s?也可以使用:

L = re.findall(r'https?://[^/"]+/', site_str)

Run Code Online (Sandbox Code Playgroud)

演示:

>>> import re
>>> example = '''
... "http://someserver.com/"
... "https://anotherserver.com/with/path"
... '''
>>> re.findall(r'https?://[^/"]+/', example)
['http://someserver.com/', 'https://anotherserver.com/']

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，7 月前
查看次数：	1463 次
最近记录：	12 年，7 月前