从文本中获取网址

Question

从文本中获取网址

可能重复:
URL的正则表达式,包括查询字符串

我有一个文字或信息.

嘿! 试试这个http://www.test.com/test.aspx?id=53

我们的要求是从文本中获取链接.我们正在使用以下代码

List<string> list = new List<string>();
Regex urlRx = new
Regex(@"(?<url>(http:|https:[/][/]|www.)([a-z]|[A-Z]|[0-9]|[/.]|[~])*)",
RegexOptions.IgnoreCase);

MatchCollection matches = urlRx.Matches(message);
foreach (Match match in matches)
{
   list.Add(match.Value);
}
return list;

Run Code Online (Sandbox Code Playgroud)

它给出了url但不是完整的.代码的输出是

http://www.test.com/test.aspx

但我们需要完整的网址

http://www.test.com/test.aspx?id=53

请建议如何解决该问题.谢谢.

Answer 1

Ama*_*ure 16

试试这个正则表达式,也返回查询字符串

(http|ftp|https)://([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\%\^\&\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?

Run Code Online (Sandbox Code Playgroud)

你可以在gskinner上测试它

似乎有点过于明确.不会`(ftp | https？):// [^\s] +`工作？ (2认同)

Answer 2

pap*_*tis 8

public List<string> GetLinks(string message)
{
    List<string> list = new List<string>();
    Regex urlRx = new Regex(@"((https?|ftp|file)\://|www.)[A-Za-z0-9\.\-]+(/[A-Za-z0-9\?\&\=;\+!'\(\)\*\-\._~%]*)*", RegexOptions.IgnoreCase);

    MatchCollection matches = urlRx.Matches(message);
    foreach (Match match in matches)
    {
        list.Add(match.Value);
    }
    return list;
}

var list = GetLinks("Hey yo check this: http://www.google.com/?q=stackoverflow and this: http://www.mysite.com/?id=10&author=me");

Run Code Online (Sandbox Code Playgroud)

它会找到以下类型的链接:

http:// ...
https:// ...
file:// ...
www. ...

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，1 月前
查看次数：	15375 次
最近记录：	7 年，4 月前