C#如何检查URL是否存在/是否有效?

Dan*_*rip 111 .net c# url-validation

我在visual c#2005中创建一个简单的程序,在Yahoo!上查找股票代码 财务,下载历史数据,然后绘制指定股票代码的价格历史记录.

我知道获取数据所需的确切URL,如果用户输入现有的股票代码(或者至少有一个有关Yahoo! Finance的数据),它的工作完全正常.但是,如果用户编写了一个股票代码,我会遇到运行时错误,因为该程序试图从不存在的网页中提取数据.

我正在使用WebClient类,并使用DownloadString函数.我查看了WebClient类的所有其他成员函数,但没有看到任何可用于测试URL的内容.

我怎样才能做到这一点?

Big*_*714 132

以下是此解决方案的另一种实现:

using System.Net;

///
/// Checks the file exists or not.
///
/// The URL of the remote file.
/// True : If the file exits, False if file not exists
private bool RemoteFileExists(string url)
{
    try
    {
        //Creating the HttpWebRequest
        HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
        //Setting the Request method HEAD, you can also use GET too.
        request.Method = "HEAD";
        //Getting the Web Response.
        HttpWebResponse response = request.GetResponse() as HttpWebResponse;
        //Returns TRUE if the Status code == 200
        response.Close();
        return (response.StatusCode == HttpStatusCode.OK);
    }
    catch
    {
        //Any exception will returns false.
        return false;
    }
}
Run Code Online (Sandbox Code Playgroud)

来自:http://www.dotnetthoughts.net/2009/10/14/how-to-check-remote-file-exists-using-c/

  • 这东西抛出 DisposedObject 作为回报(response.StatusCode == HttpStatusCode.OK);包裹在使用中 (3认同)
  • 我正在使用此代码来检查是否存在一堆图像,并且它非常慢(每个URL几秒钟).有人知道这个代码是否存在这个问题,或者在进行这些调用时只是生活中的事实? (2认同)

Mar*_*ell 105

您可以发出"HEAD"请求而不是"GET"吗?

(编辑) - 哈哈!看起来我以前做过这个!改为维基以避免指责代表.因此,要测试URL而不需要下载内容的成本:

// using MyClient from linked post
using(var client = new MyClient()) {
    client.HeadOnly = true;
    // fine, no content downloaded
    string s1 = client.DownloadString("http://google.com");
    // throws 404
    string s2 = client.DownloadString("http://google.com/silly");
}
Run Code Online (Sandbox Code Playgroud)

你会try/ catch在周围DownloadString检查错误; 没有错误?它存在......


使用C#2.0(VS2005):

private bool headOnly;
public bool HeadOnly {
    get {return headOnly;}
    set {headOnly = value;}
}
Run Code Online (Sandbox Code Playgroud)

using(WebClient client = new MyClient())
{
    // code as before
}
Run Code Online (Sandbox Code Playgroud)

  • 正是我在寻找什么.检查是否存在而无需下载内容的费用. (3认同)
  • 什么是***MyClient***? (2认同)

jsm*_*ith 36

这些解决方案非常好,但他们忘记了可能还有其他状态代码而不是200 OK.这是我在生产环境中用于状态监控等的解决方案.

如果目标页面上存在URL重定向或某些其他条件,则使用此方法返回true.此外,GetResponse()将抛出异常,因此您将无法获得StatusCode.您需要捕获异常并检查ProtocolError.

任何400或500状态代码都将返回false.所有其他人都归于真实 此代码很容易修改,以满足您对特定状态代码的需求.

/// <summary>
/// This method will check a url to see that it does not return server or protocol errors
/// </summary>
/// <param name="url">The path to check</param>
/// <returns></returns>
public bool UrlIsValid(string url)
{
    try
    {
        HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
        request.Timeout = 5000; //set the timeout to 5 seconds to keep the user from waiting too long for the page to load
        request.Method = "HEAD"; //Get only the header information -- no need to download any content

        using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
        {
            int statusCode = (int)response.StatusCode;
            if (statusCode >= 100 && statusCode < 400) //Good requests
            {
                return true;
            }
            else if (statusCode >= 500 && statusCode <= 510) //Server Errors
            {
                //log.Warn(String.Format("The remote server has thrown an internal error. Url is not valid: {0}", url));
                Debug.WriteLine(String.Format("The remote server has thrown an internal error. Url is not valid: {0}", url));
                return false;
            }
        }
    }
    catch (WebException ex)
    {
        if (ex.Status == WebExceptionStatus.ProtocolError) //400 errors
        {
            return false;
        }
        else
        {
            log.Warn(String.Format("Unhandled status [{0}] returned for url: {1}", ex.Status, url), ex);
        }
    }
    catch (Exception ex)
    {
        log.Error(String.Format("Could not test url {0}.", url), ex);
    }
    return false;
}
Run Code Online (Sandbox Code Playgroud)

  • `HttpWebResponse`对象**应该包含在`using`块**中,因为它实现了`IDisposable`,这也将确保关闭连接.这可能会导致@jbeldock面临的问题. (4认同)
  • 刚刚经历了这种方法的拉扯问题:如果在尝试下载其他任何内容之前没有`.Close()``response`对象,`HttpWebRequest`不喜欢它.花了好几个小时找到那个! (3认同)
  • 在浏览器中工作正常的网址上投放404 Not Founds ...? (2认同)

小智 9

如果我正确理解您的问题,您可以使用这样的小方法来为您提供网址测试的结果:

WebRequest webRequest = WebRequest.Create(url);  
WebResponse webResponse;
try 
{
  webResponse = webRequest.GetResponse();
}
catch //If exception thrown then couldn't get response from address
{
  return 0;
} 
return 1;
Run Code Online (Sandbox Code Playgroud)

您可以将上述代码包装在方法中并使用它来执行验证.我希望这能回答你提出的问题.


Dan*_* W. 9

很多答案都比 HttpClient 更旧(我认为它是在 Visual Studio 2013 中引入的)或者没有 async/await 功能,所以我决定发布我自己的解决方案:

private static async Task<bool> DoesUrlExists(String url)
{
    try
    {
        using (HttpClient client = new HttpClient())
        {
            //Do only Head request to avoid download full file
            var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, url));

            if (response.IsSuccessStatusCode) {
                //Url is available is we have a SuccessStatusCode
                return true;
            }
            return false;
        }                
    } catch {
            return false;
    }
}
Run Code Online (Sandbox Code Playgroud)

我使用HttpClient.SendAsyncwithHttpMethod.Head仅发出头部请求,而不下载整个文件。就像 David 和 Marc 已经说过的那样,不仅有 http 200 表示 ok,所以我用来IsSuccessStatusCode允许所有成功状态代码。


Rus*_*ail 7

我一直发现异常的处理速度要慢得多。

也许强度较低的方法会产生更好、更快的结果?

public bool IsValidUri(Uri uri)
{

    using (HttpClient Client = new HttpClient())
    {

    HttpResponseMessage result = Client.GetAsync(uri).Result;
    HttpStatusCode StatusCode = result.StatusCode;

    switch (StatusCode)
    {

        case HttpStatusCode.Accepted:
            return true;
        case HttpStatusCode.OK:
            return true;
         default:
            return false;
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

然后只需使用:

IsValidUri(new Uri("http://www.google.com/censorship_algorithm"));
Run Code Online (Sandbox Code Playgroud)


小智 6

试试这个(确保您使用System.Net):

public bool checkWebsite(string URL) {
   try {
      WebClient wc = new WebClient();
      string HTMLSource = wc.DownloadString(URL);
      return true;
   }
   catch (Exception) {
      return false;
   }
}
Run Code Online (Sandbox Code Playgroud)

调用checkWebsite()函数时,它将尝试获取传递到其中的URL的源代码。如果获取源代码,则返回true。如果不是,则返回false。

代码示例:

//The checkWebsite command will return true:
bool websiteExists = this.checkWebsite("https://www.google.com");

//The checkWebsite command will return false:
bool websiteExists = this.checkWebsite("https://www.thisisnotarealwebsite.com/fakepage.html");
Run Code Online (Sandbox Code Playgroud)


小智 5

WebRequest request = WebRequest.Create("http://www.google.com");
try
{
     request.GetResponse();
}
catch //If exception thrown then couldn't get response from address
{
     MessageBox.Show("The URL is incorrect");`
}
Run Code Online (Sandbox Code Playgroud)