如何在不下载内容的情况下执行GET请求?

Sam*_*ron 27 .net c# httpwebrequest servicepoint

我正在研究链接检查器,一般情况下我可以执行HEAD请求,但是有些网站似乎禁用了这个动词,所以在失败时我还需要执行一个GET请求(仔细检查链接是否真的死了)

我使用以下代码作为我的链接测试器:

public class ValidateResult
{
  public HttpStatusCode? StatusCode { get; set; }
  public Uri RedirectResult { get; set; }
  public WebExceptionStatus? WebExceptionStatus { get; set; }
}


public ValidateResult Validate(Uri uri, bool useHeadMethod = true, 
            bool enableKeepAlive = false, int timeoutSeconds = 30)
{
  ValidateResult result = new ValidateResult();

  HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
  if (useHeadMethod)
  {
    request.Method = "HEAD";
  }
  else
  {
    request.Method = "GET";
  }

  // always compress, if you get back a 404 from a HEAD it can be quite big.
  request.AutomaticDecompression = DecompressionMethods.GZip;
  request.AllowAutoRedirect = false;
  request.UserAgent = UserAgentString;
  request.Timeout = timeoutSeconds * 1000;
  request.KeepAlive = enableKeepAlive;

  HttpWebResponse response = null;
  try
  {
    response = request.GetResponse() as HttpWebResponse;

    result.StatusCode = response.StatusCode;
    if (response.StatusCode == HttpStatusCode.Redirect ||
      response.StatusCode == HttpStatusCode.MovedPermanently ||
      response.StatusCode == HttpStatusCode.SeeOther)
    {
      try
      {
        Uri targetUri = new Uri(Uri, response.Headers["Location"]);
        var scheme = targetUri.Scheme.ToLower();
        if (scheme == "http" || scheme == "https")
        {
          result.RedirectResult = targetUri;
        }
        else
        {
          // this little gem was born out of http://tinyurl.com/18r 
          // redirecting to about:blank
          result.StatusCode = HttpStatusCode.SwitchingProtocols;
          result.WebExceptionStatus = null;
        }
      }
      catch (UriFormatException)
      {
        // another gem... people sometimes redirect to http://nonsense:port/yay
        result.StatusCode = HttpStatusCode.SwitchingProtocols;
        result.WebExceptionStatus = WebExceptionStatus.NameResolutionFailure;
      }

    }
  }
  catch (WebException ex)
  {
    result.WebExceptionStatus = ex.Status;
    response = ex.Response as HttpWebResponse;
    if (response != null)
    {
      result.StatusCode = response.StatusCode;
    }
  }
  finally
  {
    if (response != null)
    {
      response.Close();
    }
  }

  return result;
}
Run Code Online (Sandbox Code Playgroud)

这一切都很好,花花公子.除了当我执行GET请求时,整个有效负载被下载(我在wireshark中观看了这个).

有没有办法配置底层ServicePointHttpWebRequest不缓冲或急切加载响应体?

(如果我手动编码,我会将TCP接收窗口设置得很低,然后只获取足够的数据包来获取Headers,一旦我有足够的信息就停止对TCP数据包进行处理.)

对于那些想知道这意味着什么的人,我不想下载40k 404当我得到404,这样做几十万在网络上是昂贵的

Jim*_*hel 8

执行GET时,服务器将开始从文件的开头发送数据到结尾.除非你打断它.当然,以10 Mb /秒的速度,这将是每秒兆字节,所以如果文件很小,你将得到整个东西.您可以通过几种方式最小化实际下载的数量.

首先,您可以request.Abort在收到回复之后和来电之前打电话response.close.这将确保底层代码在关闭响应之前不会尝试下载整个内容.这是否有助于小文件,我不知道.我知道它会阻止你的应用程序在尝试下载一个数千兆字节的文件时挂起.

您可以做的另一件事是请求范围,而不是整个文件.请参阅AddRange方法及其重载.例如,您可以编写request.AddRange(512),它只下载文件的前512个字节.当然,这取决于支持范围查询的服务器.大部分都做.但是,大多数人也支持HEAD请求.

您可能最终必须编写一个按顺序尝试事物的方法:

  • 尝试做一个HEAD请求.如果可行(即不返回500),那么你就完成了
  • 尝试使用范围查询GET.如果那不会返回500,那么你已经完成了.
  • 做一个常规的GET,request.Abort后面的GetResponse回报.