使用Web代理服务获取目标网址的html内容?

Ele*_*ios 10 .net c# vb.net http httpwebrequest

C#VB.Net中,我需要通过Web代理服务访问网页,对我感兴趣的目标网址进行网页报废.

让我们举一个随机的网络代理服务(真的无论哪一个,我愿意接受建议),例如下面这个,这不会像其他人那样使查询中的哈希变得复杂(这是我不喜欢的事情)我知道如何处理):

http://proxyanonimo.es/browse.php?u=http%3a%2f%2furl.com
Run Code Online (Sandbox Code Playgroud)

然后,当我执行一个HttpWebRequest我想要在响应中遇到的URL时,目标url的html内容,但不是我得到这个内容:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>
<head>
<title>Proxy Anonimo :: Spanish Web Proxy</title>
<meta name="keywords" content="proxy, webproxy, proxy online, spanish proxy" />
<meta name="description" content="Usa nuestro WebProxy An&#65533;nimo para comprobar como se ve una web desde otro sitio que no sea el ordenador en el que est&#65533;s sentado. Es un acceso remoto desde nuestro servidor." />

<style type="text/css">
    html, body {
       text-align: center;
    }
    #wrapper {
       width: 740px;
       margin: 0 auto 0 auto;
       text-align: left;
       padding: 10px;
       background: #eee;
       border: 4px outset #ccc;
    }
    #footer {
       margin: 10px 0 0 0; 
       font-size: 80%;
       color: #ccc;
    }
    #error {
       border: 1px solid red;
       padding: 2px;
       margin: 5px 0 15px 0;
       background: #eee;
    }
    .center { text-align: center; }

    /* TOOLTIP HOVER EFFECT */
    #tooltip{ 
       width:20em; background: #fff;
    }
</style>
    <script type="text/javascript">ginf={url:'http://proxyanonimo.es',script:'browse.php',target:{h:'http://myurl.com',p:'/',b:'',u:'http://myurl.com'},enc:{u:'iawpK1Q337kKRtEraNzZubjsx46C64Qd4aqEZ6vR2GrHZTZXxmNPoU7JM4aGYQJROYjBUFiKbxiYh5LEhmjt4g3G83dVHKClyLMhgTRfgX1nSBPYLYhG38a11bMwMcF8',e:'',x:'',p:''},b:'12'}</script>
    <script type="text/javascript" src="http://proxyanonimo.es/includes/main.js?1.4.1"></script></head>
<body>
<div id="wrapper">

    <h1 class="center"><a href="index.php">Proxy Anonimo</a></h1>
    <h2 class="center">IPv6 Ready!</h2> 
    <div id="error">Hotlinking directly to proxied pages is not permitted.</div><p style="text-align:right">[<a href="http://proxyanonimo.es/browse.php?u=http%3a%2f%2fmyurl.com&amp;b=12&amp;f=norefer">Reload http://myurl.com</a>]</p>

    <h2>Proxy</h2>

       Usa nuestro WebProxy An&#65533;nimo para comprobar como se ve una web desde otro sitio que no sea el ordenador en el que est&#65533;s sentado. Es un acceso remoto desde nuestro servidor. Si tu conexi&#65533;n tiene alguna restricci&#65533;n, con nuestro Proxy An&#65533;nimo no tendr&#65533;as que tener problema o por lo menos, asegurarte de si la web es accesible o no. 

    <h2>URL</h2>

    <form action="includes/process.php?action=update" method="post" onsubmit="return updateLocation(this);">
        <input type="text" name="u" id="input" size="60">



        <!--<input type="submit" value="Go">-->

        <h3>Options</h3>
        <ul id="options">
            <li><input type="checkbox" name="encodeURL" id="encodeURL"><label for="encodeURL" class="tooltip" onmouseover="tooltip('Encrypts the URL of the page you are viewing so that it does not contain the target site in plaintext.')" onmouseout="exit();">Encrypt URL</label></li><li><input type="checkbox" name="encodePage" id="encodePage"><label for="encodePage" class="tooltip" onmouseover="tooltip('Helps avoid filters by encrypting the page before sending it and decrypting it with javascript once received.')" onmouseout="exit();">Encrypt Page</label></li><li><input type="checkbox" name="allowCookies" id="allowCookies" checked="checked"><label for="allowCookies" class="tooltip" onmouseover="tooltip('Cookies may be required on interactive websites (especially where you need to log in) but advertisers also use cookies to track your browsing habits.')" onmouseout="exit();">Allow Cookies</label></li><li><input type="checkbox" name="tempCookies" id="tempCookies" checked="checked"><label for="tempCookies" class="tooltip" onmouseover="tooltip('This option overrides the expiry date for all cookies and sets it to at the end of the session only - all cookies will be deleted when you shut your browser. (Recommended)')" onmouseout="exit();">Force Temporary Cookies</label></li><li><input type="checkbox" name="stripTitle" id="stripTitle"><label for="stripTitle" class="tooltip" onmouseover="tooltip('Removes titles from proxied pages.')" onmouseout="exit();">Remove Page Titles</label></li><li><input type="checkbox" name="stripJS" id="stripJS"><label for="stripJS" class="tooltip" onmouseover="tooltip('Remove scripts to protect your anonymity and speed up page loads. However, not all sites will provide an HTML-only alternative. (Recommended)')" onmouseout="exit();">Remove Scripts</label></li><li><input type="checkbox" name="stripObjects" id="stripObjects"><label for="stripObjects" class="tooltip" onmouseover="tooltip('You can increase page load times by removing unnecessary Flash, Java and other objects. If not removed, these may also compromise your anonymity.')" onmouseout="exit();">Remove Objects</label></li>      </ul>
    </form>

    <br>

    <br><br><br>

    <p><a href="http://s07.flagcounter.com/more/xu5M"><img src="http://s07.flagcounter.com/count/xu5M/bg=FFFFFF/txt=000000/border=CCCCCC/columns=8/maxflags=248/viewers=De+donde+nos+visitan/labels=1/pageviews=1/" alt="free counters" border="0"></a></p>


    <div id="eXTReMe"><a href="http://extremetracking.com/open?login=proxyes">
<img src="http://t1.extreme-dm.com/i.gif" style="border: 0;"
height="38" width="41" id="EXim" alt="eXTReMe Tracker" /></a>
<script type="text/javascript"><!--
EXref="";top.document.referrer?EXref=top.document.referrer:EXref=document.referrer;//-->
</script><script type="text/javascript"><!--
var EXlogin='proxyes' // Login
var EXvsrv='s10' // VServer
EXs=screen;EXw=EXs.width;navigator.appName!="Netscape"?
EXb=EXs.colorDepth:EXb=EXs.pixelDepth;EXsrc="src";
navigator.javaEnabled()==1?EXjv="y":EXjv="n";
EXd=document;EXw?"":EXw="na";EXb?"":EXb="na";
EXref?EXref=EXref:EXref=EXd.referrer;
EXd.write("<img "+EXsrc+"=http://e1.extreme-dm.com",
"/"+EXvsrv+".g?login="+EXlogin+"&amp;",
"jv="+EXjv+"&amp;j=y&amp;srw="+EXw+"&amp;srb="+EXb+"&amp;",
"l="+escape(EXref)+" height=1 width=1>");//-->
</script><noscript><div id="neXTReMe"><img height="1" width="1" alt=""
src="http://e1.extreme-dm.com/s10.g?login=proxyes&amp;j=n&amp;jv=n" />
</div></noscript></div>

<p class="center">Powered by <a href="http://www.glype.com/">Glype</a>&reg; v1.4.1.</p> 
</div>

<script type="text/javascript">
var infolinks_pid = 1993344;
var infolinks_wsid = 0;
</script>
<script type="text/javascript" src="http://resources.infolinks.com/js/infolinks_main.js"></script>

</body>
</html>
Run Code Online (Sandbox Code Playgroud)

那么......这可能是做什么的?

我错过了什么?

也许我正在尝试的网络代理服务正在重新限制我的东西?也许另一个网络代理服务可以帮助我更好地满足我的需求?

Vov*_*ova 5

我建议你使用直接代理IP:端口,例如115.238.225.26:80.然后你可以使用下一个代码轻松处理问题:

HttpWebRequest req = (HttpWebRequest) WebRequest.Create(new Uri("http://example.com"));
WebProxy webproxy = new WebProxy("115.238.225.26", 80);
webproxy.BypassProxyOnLocal = false;
req.Method = "GET";
req.Proxy = webproxy;
HttpWebResponse response = (HttpWebResponse) req.GetResponse();
var respStream = response.GetResponseStream();
var result = "";
if (respStream != null) {
    var strReader = new StreamReader(respStream);
    result = strReader.ReadToEnd();
}
Run Code Online (Sandbox Code Playgroud)

然后在结果变量中,您将找到结果页面内容或空字符串,以防出现一些问题(respStream == null).此外,可能需要为此代码添加异常处理,以防出现任何连接问题.


and*_*wgu 5

您似乎遇到的主要问题是,您使用的代理示例需要POST才能更新您尝试浏览代理的目标URL。这就是为什么您没有从目标页面获取任何内容以及错误消息的原因

<div id="error">Hotlinking directly to proxied pages is not permitted.</div>
Run Code Online (Sandbox Code Playgroud)

我不知道您的代码看起来如何,但似乎您可以使用HttpWebRequest POST方法

WebRequest request = (HttpWebRequest)WebRequest.Create("http://www.glype-proxy.info/includes/process.php?action=update");

var postData = "url="+"http://www.example.com";
postData += "&allowCookies=on";
var data = Encoding.ASCII.GetBytes(postData);

request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;

using (var stream = request.GetRequestStream()) {
    stream.Write(data, 0, data.Length);
}

var response = (HttpWebResponse)request.GetResponse();
var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
Run Code Online (Sandbox Code Playgroud)

您将需要查找或托管一个返回页面HTML的代理,例如http://www.glype-proxy.info/。即便如此,为了使代理正常运行,它必须将指向页面资源的链接更改为其自己的“代理”路径。

http://www.glype-proxy.info/browse.php?u=https%3A%2F%2Fwww.example.com%2F&b=4&f=norefer
Run Code Online (Sandbox Code Playgroud)

在上面的URL中,如果想要原始资源的路径,则必须找到所有已重定向的资源,并对作为u=参数传递给此特定代理的路径取消编码。另外,您可能希望忽略proxy注入的其他元素,在这种情况下为<div id="include">元素。


我相信您使用的代理与本示例中使用的“ Glype”代理的工作方式相同,但是在发布时我没有访问权限。另外,如果要使用其他代理,则可能要注意,许多代理在iFrame中显示结果(可能用于XSS防护,导航或蒙皮)。

注意:通常,在内置API之外使用其他服务是一种不好的做法,因为服务通常会进行GUI更新或其他可能破坏脚本的更改。此外,这些服务可能会中断或被关闭。