我使用以下正则表达式从文本中获取URL(例如"this is text http://url.com/blabla possibly some more text"
).
'@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@'
Run Code Online (Sandbox Code Playgroud)
这适用于所有的网址,但我只是发现了它不为缩短的URL一样工作:"blabla bla http://ff.im/-bEnA blabla"
成为http://ff.im/
赛后.
我怀疑它与-
斜线后的破折号有关/
.
简答:[\w/_\.]
不匹配-
所以做到[-\w/_\.]
答案很长:
@ - delimiter
( - start of group
https?:// - http:// or https://
([-\w.]+)+ - capture 1 or more hyphens, word characters or dots, 1 or more times.. this seems odd - don't know what the second + is for
(:\d+)? - optionally capture a : and some numbers (the port)
( - start of group
/ - leading slash
( - start of group
[\w/_\.] - any word character, underscore or dot - you need to add hyphen to this list or just make it [^?\S] - any char except ? or whitespace (the path + filename)
(\?\S+)? - optionally capture a ? followed by anything except whitespace (the querystring)
)? - close group, make it optional
)? - close group, make it optional
) - close group
@
Run Code Online (Sandbox Code Playgroud)