有谁知道我可以用来查找字符串中的URL的正则表达式?我在Google上发现了很多正则表达式,用于确定整个字符串是否为URL,但我需要能够在整个字符串中搜索URL.例如,我希望能够找到www.google.com并http://yahoo.com在以下字符串中:
Hello www.google.com World http://yahoo.com
Run Code Online (Sandbox Code Playgroud)
我不是在寻找字符串中的特定URL.我正在寻找字符串中的所有URL,这就是我需要正则表达式的原因.
Raj*_*eev 177
这是我使用的那个
(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?
Run Code Online (Sandbox Code Playgroud)
对我有用,也适合你.
Ste*_*nze 38
猜猜没有正则表达式适合这种用途.我在这里发现了一个相当坚实的人
/(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])/igm
Run Code Online (Sandbox Code Playgroud)
与此处发布的其他差异相比,有些差异/优势:
moo.com没有http或没有的东西www请看这里的例子
Goo*_*JaY 14
text = """The link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string
Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd
The code below catches all urls in text and returns urls in list."""
urls = re.findall('(?:(?:https?|ftp):\/\/)?[\w/\-?=%.]+\.[\w/\-?=%.]+', text)
print(urls)
Run Code Online (Sandbox Code Playgroud)
输出:
[
'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string',
'www.google.com',
'facebook.com',
'http://test.com/method?param=wasd'
]
Run Code Online (Sandbox Code Playgroud)
自己写了一篇:
let regex = /([\w+]+\:\/\/)?([\w\d-]+\.)*[\w-]+[\.\:]\w+([\/\?\=\&\#\.]?[\w-]+)*\/?/gm
Run Code Online (Sandbox Code Playgroud)
它适用于以下所有域:
https://www.facebook.com
https://app-1.number123.com
http://facebook.com
ftp://facebook.com
http://localhost:3000
localhost:3000/
unitedkingdomurl.co.uk
this.is.a.url.com/its/still=going?wow
shop.facebook.org
app.number123.com
app1.number123.com
app-1.numbEr123.com
app.dashes-dash.com
www.facebook.com
facebook.com
fb.com/hello_123
fb.com/hel-lo
fb.com/hello/goodbye
fb.com/hello/goodbye?okay
fb.com/hello/goodbye?okay=alright
Hello www.google.com World http://yahoo.com
https://www.google.com.tr/admin/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
https://google.com.tr/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
http://google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
ftp://google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
www.google.com.tr/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
www.google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
drive.google.com/test/subPage?qs1=sss1&qs2=sss2&qs3=sss3#Services
https://www.example.pl
http://www.example.com
www.example.pl
example.com
http://blog.example.com
http://www.example.com/product
http://www.example.com/products?id=1&page=2
http://www.example.com#up
http://255.255.255.255
255.255.255.255
shop.facebook.org/derf.html
Run Code Online (Sandbox Code Playgroud)
您可以在 regex101 上查看它的执行情况并根据需要进行调整
如果您必须严格选择链接,我会选择:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
Run Code Online (Sandbox Code Playgroud)
有关更多信息,请阅读以下内容:
以上所有答案都不匹配 URL 中的 Unicode 字符,例如:http : //google.com?query=??c+filan+?ã+search
对于解决方案,这个应该有效:
(ftp:\/\/|www\.|https?:\/\/){1}[a-zA-Z0-9u00a1-\uffff0-]{2,}\.[a-zA-Z0-9u00a1-\uffff0-]{2,}(\S*)
Run Code Online (Sandbox Code Playgroud)
这里提供的解决方案都没有解决我遇到的问题/用例.
我在这里提供的是迄今为止我发现/制作的最好的.当我发现它无法处理的新边缘情况时,我会更新它.
\b
#Word cannot begin with special characters
(?<![@.,%&#-])
#Protocols are optional, but take them with us if they are present
(?<protocol>\w{2,10}:\/\/)?
#Domains have to be of a length of 1 chars or greater
((?:\w|\&\#\d{1,5};)[.-]?)+
#The domain ending has to be between 2 to 15 characters
(\.([a-z]{2,15})
#If no domain ending we want a port, only if a protocol is specified
|(?(protocol)(?:\:\d{1,6})|(?!)))
\b
#Word cannot end with @ (made to catch emails)
(?![@])
#We accept any number of slugs, given we have a char after the slash
(\/)?
#If we have endings like ?=fds include the ending
(?:([\w\d\?\-=#:%@&.;])+(?:\/(?:([\w\d\?\-=#:%@&;.])+))*)?
#The last char cannot be one of these symbols .,?!,- exclude these
(?<![.,?!-])
Run Code Online (Sandbox Code Playgroud)
小智 5
我使用下面的正则表达式在字符串中查找 url:
/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/
Run Code Online (Sandbox Code Playgroud)
我认为这种正则表达式模式恰好可以满足您的需求
/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/
Run Code Online (Sandbox Code Playgroud)
这是提取Urls的摘要示例:
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// The Text you want to filter for urls
$text = "The text you want /sf/ask/422664301/ to filter goes here.";
// Check if there is a url in the text
preg_match_all($reg_exUrl, $text, $url,$matches);
var_dump($matches);
Run Code Online (Sandbox Code Playgroud)
小智 5
我发现这涵盖了大多数示例链接,包括子目录部分。
正则表达式是:
(?:(?:https?|ftp):\/\/|\b(?:[a-z\d]+\.))(?:(?:[^\s()<>]+|\((?:[^\s()<>]+|(?:\([^\s()<>]+\)))?\))+(?:\((?:[^\s()<>]+|(?:\(?:[^\s()<>]+\)))?\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))?
Run Code Online (Sandbox Code Playgroud)