范围:
我正在开发一个C#aplication来模拟对这个站点的查询.我非常熟悉模拟Web请求以实现相同的人工步骤,但使用代码.
如果您想尝试自己,只需在CNPJ框中键入此编号:
08775724000119然后编写验证码并单击Confirmar
我已经使用验证码了,所以这不再是问题了.
问题:
一旦我执行"CNPJ"的POST请求,就会抛出异常:
远程服务器返回错误:(403)禁止.
Fiddler调试器输出:
这是我的浏览器生成的请求,而不是我的代码
POST https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco HTTP/1.1
Host: www.sefaz.rr.gov.br
Connection: keep-alive
Content-Length: 208
Cache-Control: max-age=0
Origin: https://www.sefaz.rr.gov.br
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Referer: https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco
Accept-Encoding: gzip,deflate,sdch
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: GX_SESSION_ID=gGUYxyut5XRAijm0Fx9ou7WnXbVGuUYoYTIKtnDydVM%3D; JSESSIONID=OVuuMFCgQv9k2b3fGyHjSZ9a.undefined
// PostData :
_EventName=E%27CONFIRMAR%27.&_EventGridId=&_EventRowId=&_MSG=&_CONINSEST=&_CONINSESTG=08775724000119&cfield=rice&_VALIDATIONRESULT=1&BUTTON1=Confirmar&sCallerURL=http%3A%2F%2Fwww.sintegra.gov.br%2Fnew_bv.html
Run Code Online (Sandbox Code Playgroud)
使用的代码示例和参考:
我正在使用自行开发的库来处理/包装Post和Get请求.
请求对象具有与浏览器发出的参数(Host,Origin,Referer,Cookies ..)相同的参数(在此处记录我的提琴手).
我还设法ServicePointValidator使用以下方法设置证书:
ServicePointManager.ServerCertificateValidationCallback =
new RemoteCertificateValidationCallback (delegate { return true; });
Run Code Online (Sandbox Code Playgroud)
完成所有配置后,我仍然得到禁止的异常.
这是我如何模拟请求并抛出异常
try
{
this.Referer = Consts.REFERER;
// PARAMETERS: URL, POST DATA, ThrownException (bool)
response = Post (Consts.QUERYURL, postData, true);
}
catch (Exception ex)
{
string s = ex.Message;
}
Run Code Online (Sandbox Code Playgroud)
在此先感谢您对我的问题的任何帮助/解决方案
更新1:
我错过了主页的请求,这会生成cookie(感谢@ W0lf指出我的意思)
现在还有另外一件奇怪的事情.Fiddler没有在请求中显示我的Cookie,但它们是:

我使用浏览器做了一个成功的请求并将其记录在Fiddler中.
唯一与您的要求不同的是:
sCallerURL参数发送任何值(我有sCallerURL=而不是sCallerURL=http%3A%2F%2Fwww....)Accept-Language:价值观(我很确定这不重要)Content-Length是不同的(明显)好吧,我认为Fiddler跟踪来自您的应用程序.如果您未在请求中设置cookie,请执行以下操作:
https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco.如果您检查响应,您会注意到该网站发送了两个会话cookie.如果您不知道如何存储cookie并在其他请求中使用它们,请查看此处.
好吧,我设法重现了403,找出了导致它的原因,并找到了解决办法.
POST请求中发生的情况是:
.NET的HttpWebRequest尝试无缝地重定向,但在这种情况下有两个问题(我会考虑.NET实现中的错误):
POST(重定向)后的GET请求与POST请求()具有相同的内容类型application/x-www-form-urlencoded.对于GET请求,不应指定
cookie处理问题(最重要的问题) - 网站发送两个cookie:GX_SESSION_ID和JSESSIONID.第二个路径指定了(/sintegra),而第一个路径没有.
区别在于:浏览器默认/为第一个cookie分配(root)路径,而.NET则为其分配请求url path(/sintegra/servlet/hwsintco).
因此,最后一个GET请求(在重定向之后)/sintegra/servlet/hwsintpe...没有获得传入的第一个cookie,因为它的路径不对应.
为此,请告诉它不要遵循重定向:
postRequest.AllowAutoRedirect = false
Run Code Online (Sandbox Code Playgroud)
然后从POST响应中读取重定向位置并手动对其执行GET请求.
为此,我发现的修复是从CookieContainer中取出错放的cookie,正确设置它的路径并将其添加回正确位置的容器中.
这是执行此操作的代码:
private void FixMisplacedCookie(CookieContainer cookieContainer)
{
var misplacedCookie = cookieContainer.GetCookies(new Uri(Url))[0];
misplacedCookie.Path = "/"; // instead of "/sintegra/servlet/hwsintco"
//place the cookie in thee right place...
cookieContainer.SetCookies(
new Uri("https://www.sefaz.rr.gov.br/"),
misplacedCookie.ToString());
}
Run Code Online (Sandbox Code Playgroud)
using System;
using System.IO;
using System.Net;
using System.Text;
namespace XYZ
{
public class Crawler
{
const string Url = "https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco";
public void Crawl()
{
var cookieContainer = new CookieContainer();
/* initial GET Request */
var getRequest = (HttpWebRequest)WebRequest.Create(Url);
getRequest.CookieContainer = cookieContainer;
ReadResponse(getRequest); // nothing to do with this, because captcha is f#@%ing dumb :)
/* POST Request */
var postRequest = (HttpWebRequest)WebRequest.Create(Url);
postRequest.AllowAutoRedirect = false; // we'll do the redirect manually; .NET does it badly
postRequest.CookieContainer = cookieContainer;
postRequest.Method = "POST";
postRequest.ContentType = "application/x-www-form-urlencoded";
var postParameters =
"_EventName=E%27CONFIRMAR%27.&_EventGridId=&_EventRowId=&_MSG=&_CONINSEST=&" +
"_CONINSESTG=08775724000119&cfield=much&_VALIDATIONRESULT=1&BUTTON1=Confirmar&" +
"sCallerURL=";
var bytes = Encoding.UTF8.GetBytes(postParameters);
postRequest.ContentLength = bytes.Length;
using (var requestStream = postRequest.GetRequestStream())
requestStream.Write(bytes, 0, bytes.Length);
var webResponse = postRequest.GetResponse();
ReadResponse(postRequest); // not interested in this either
var redirectLocation = webResponse.Headers[HttpResponseHeader.Location];
var finalGetRequest = (HttpWebRequest)WebRequest.Create(redirectLocation);
/* Apply fix for the cookie */
FixMisplacedCookie(cookieContainer);
/* do the final request using the correct cookies. */
finalGetRequest.CookieContainer = cookieContainer;
var responseText = ReadResponse(finalGetRequest);
Console.WriteLine(responseText); // Hooray!
}
private static string ReadResponse(HttpWebRequest getRequest)
{
using (var responseStream = getRequest.GetResponse().GetResponseStream())
using (var sr = new StreamReader(responseStream, Encoding.UTF8))
{
return sr.ReadToEnd();
}
}
private void FixMisplacedCookie(CookieContainer cookieContainer)
{
var misplacedCookie = cookieContainer.GetCookies(new Uri(Url))[0];
misplacedCookie.Path = "/"; // instead of "/sintegra/servlet/hwsintco"
//place the cookie in thee right place...
cookieContainer.SetCookies(
new Uri("https://www.sefaz.rr.gov.br/"),
misplacedCookie.ToString());
}
}
}
Run Code Online (Sandbox Code Playgroud)