如何从受保护的网站登录,导航和返回数据,到目前为止我所做的一切都无法正常工作

Bri*_*ort 5 c# screen-scraping login httpwebrequest httpwebresponse

虽然我发现了许多关于如何使用HttpWebRequest和Response进行GET和POST的文章和其他信息,但我发现自己很难让事情发挥作用,就像我期望它们一样.

我一直在玩我发现的几个想法,但到目前为止,没有任何工作......我会发布我的代码:

private void start_post()
    {
        string username = txtUser.Text;
        string password = txtPassword.Text;
        string strResponce;
        byte[] buffer = Encoding.ASCII.GetBytes("username="+username+"&password="+password);
        HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(txtLink.Text);
        WebReq.Method = "POST";
        //WebReq.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
        WebReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)";
        WebReq.Headers.Add("Translate", "F");
        WebReq.AllowAutoRedirect = true;
        WebReq.CookieContainer = cookieJar;
        WebReq.KeepAlive = true;
        WebReq.ContentType = "application/x-www-form-urlencoded";
        WebReq.ContentLength = buffer.Length;
        Stream PostData = WebReq.GetRequestStream();
        PostData.Write(buffer, 0, buffer.Length);
        PostData.Close();

        HttpWebResponse WebResp = (HttpWebResponse)WebReq.GetResponse();
        //txtResult.Text = WebResp.StatusCode.ToString() + WebResp.Server.ToString();

        Stream answer = WebResp.GetResponseStream();
        StreamReader _answer = new StreamReader(answer);
        strResponce = _answer.ReadToEnd();
        //txtResult.Text = txtResult.Text + _answer.ReadToEnd();

        answer.Close();
        _answer.Close();

        foreach (Cookie cookie in WebResp.Cookies)
        {
            cookieJar.Add(new Cookie(cookie.Name.Trim(), cookie.Value.Trim(), cookie.Path, cookie.Domain));
            txtResult.Text += cookie.Name.ToString() + Environment.NewLine + cookie.Value.ToString() + Environment.NewLine + cookie.Path.ToString() + Environment.NewLine + cookie.Domain.ToString();
        }

        if (strResponce.Contains("Log On Successful") || strResponce.Contains("already has a webseal session"))
        {
            MessageBox.Show("Login success");
            foreach (Control cont in this.Controls)
            {
                cont.Visible = true;
            }
        }
        else
        {
            MessageBox.Show("Login Failed.");
        }


    }
Run Code Online (Sandbox Code Playgroud)

在代码中,我能够一直到底,当我导航到http://www.comicearth.com(我自己的网站,php和apache)时仍然无法登录失败我创建了一个表单,从该表单中,我输入密码和用户名.当它这样做时,它表示失败,这没关系.我也在使用Fidder来观察发生了什么.

所以从这里,我知道我从下面的代码做错了.

但是,当我导航到另一个Web应用程序时,我在行上收到以下错误:

HttpWebResponse WebResp = (HttpWebResponse)WebReq.GetResponse();
Run Code Online (Sandbox Code Playgroud)

"无法为不写入数据的操作设置Content-Length或Chunked Encoding."

我试图找出错误,我所说的一切都是因为302重定向......

所以,看着Fiddler,我可以看到我尝试发布数据和通过网页登录时的巨大差异.所以我知道我做得不够,但我不知道在哪里看.

我的目标是构建一个能够登录网站的应用程序,然后通过他们的搜索选项提取当前我们的用户手动执行的必要数据,如果我可以自动执行一些繁琐的工作,它将真正帮助每个人出.但是,我目前仍然坚持登录,了解cookie等...此外,该网站使用框架,我不知道这是否会成为一个问题,但我想我会发布这些信息,以防万一这是我还没遇到的另一个障碍.

如果您需要我查看更多代码,请告诉我,目前我正在使用httpwebrequest和httpwebresponse,并且我已阅读有关Web客户端的其他信息.

我已经下载并玩过htmlagilitypack,但此时并不确定我是否100%擅长这一切的效果.

如果你知道任何好的文章,或者更深入地介绍这个主题的其他信息,或者有任何我可以尝试的信息,请告诉我.

非常感谢你的时间.

使用新代码进行更新,请参阅下面的评论: - 好的,我发现因为重定向我收到了一条错误消息:"Content-Length或Chunked Encoding等......"所以我转了allowAutoRedirect =假,现在我寻找"位置"标签,并重定向自己等,摆脱了这个消息,但是,我仍然没有登录到该网站,这是令人失望的,我无法弄清楚为什么在时刻.:S

private void start_post2()
    {
        string username = txtUser.Text;
        string password = txtPassword.Text;
        Uri link = new Uri(txtLink.Text);
        string postArgs = string.Format(@"userId={0}&password={1}", username, password);
        byte[] buffer = Encoding.ASCII.GetBytes(postArgs);
        HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(txtLink.Text);
        WebReq.Method = "POST";
        //WebReq.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
        WebReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)";
        //WebReq.ClientCertificates.Add("Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5");
        WebReq.AllowAutoRedirect = false;
        WebReq.Accept = "application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
        WebReq.Accept = "*/*";
        //WebReq.Headers.Add(HttpRequestHeader.Cookie, cookieJar);
        WebReq.CookieContainer = cookieJar;
        WebReq.KeepAlive = true;
        WebReq.ContentType = "application/x-www-form-urlencoded";
        WebReq.ContentLength = buffer.Length;
        Stream PostData = WebReq.GetRequestStream();
        PostData.Write(buffer, 0, buffer.Length);
        PostData.Close();

        HttpWebResponse WebResp = (HttpWebResponse)WebReq.GetResponse();
        if (WebResp == null) throw new Exception("Response is null");

        foreach (Cookie cookie in WebResp.Cookies)
        {
            cookieJar.Add(new Cookie(cookie.Name.Trim(), cookie.Value.Trim(), cookie.Path, cookie.Domain));
            //txtResult.Text += cookie.Name.ToString() + Environment.NewLine + cookie.Value.ToString() + Environment.NewLine + cookie.Path.ToString() + Environment.NewLine + cookie.Domain.ToString();
        }

        if (!string.IsNullOrEmpty(WebResp.Headers["Location"]))
        {
            string newLocation = WebResp.Headers["Location"];

            //Request the new location
            WebReq = (HttpWebRequest)WebRequest.Create(newLocation);
            WebReq.Method = "GET";
            WebReq.ContentType = "application/x-www-form-unlencoded";
            WebReq.AllowAutoRedirect = false;
            WebReq.CookieContainer = cookieJar;
            WebReq.CookieContainer.Add(WebResp.Cookies);

            buffer = Encoding.ASCII.GetBytes("userId=" + username + "&password=" + password);

            WebReq.ContentLength = buffer.Length;
            PostData = WebReq.GetRequestStream();
            PostData.Write(buffer, 0, buffer.Length);
            PostData.Close();

            WebResp = (HttpWebResponse)WebReq.GetResponse();

            foreach (Cookie cookie in WebResp.Cookies)
            {
                cookieJar.Add(new Cookie(cookie.Name.Trim(), cookie.Value.Trim(), cookie.Path, cookie.Domain));
                //txtResult.Text += cookie.Name.ToString() + Environment.NewLine + cookie.Value.ToString() + Environment.NewLine + cookie.Path.ToString() + Environment.NewLine + cookie.Domain.ToString();
            }
        }
        else if (!string.IsNullOrEmpty(WebResp.Headers["Set-Cookie"]))
        {
            // thinking...
        }

        foreach (Cookie cookie in cookieJar.GetCookies(link))
        {
            MessageBox.Show(cookie.Name.ToString() + Environment.NewLine + cookie.Value.ToString() + Environment.NewLine + cookie.Path.ToString() + Environment.NewLine + cookie.Domain.ToString());
        }

        StreamReader sr = new StreamReader(WebResp.GetResponseStream());
        string responseHtml = sr.ReadToEnd().Trim();

        SearchPatient(WebReq, username, password);

    }
Run Code Online (Sandbox Code Playgroud)

Ank*_*Roy 4

如果它是一个 winform 应用程序并且该应用程序只是一个屏幕抓取程序而不是一个很大的应用程序,您可以使用Watin进行抓取

这是入门链接