小编Joe*_*Mac的帖子

从网页中提取reCaptcha,通过cURL在外部完成,然后返回结果到查看页面

我正在创建一个供个人使用的网络抓取工具,它根据我的个人输入抓取汽车经销商网站,但我尝试从被重定向验证码页面阻止的几个网站收集数据。我正在用 curl 抓取的当前站点返回此 HTML

<html>
   <head>
      <title>You have been blocked</title>
      <style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style>
   </head>
   <body style="margin:0">
      <p id="cmsg">Please enable JS and disable any ad blocker</p>
      <script>
            var dd={'cid':'AHrlqAAAAAMA1gZrYHNP4MIAAYhtzg==','hsh':'C0705ACD75EBF650A07FF8291D3528','t':'fe','host':'geo.captcha-delivery.com'}
      </script>
      <script src="https://ct.captcha-delivery.com/c.js"></script>
   </body>
</html>
Run Code Online (Sandbox Code Playgroud)

我正在使用它来抓取页面:

<?php

function web_scrape($url)
{
    $ch = curl_init();
    $imei = "013977000272744";

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_VERBOSE, 1);
    curl_setopt($ch, CURLOPT_COOKIE, '_ym_uid=1460051101134309035;  _ym_isad=1; cxx=80115415b122e7c81172a0c0ca1bde40; _ym_visorc_20293771=w');
    curl_setopt($ch, CURLOPT_POSTFIELDS, array(
        'imei' => $imei,
    ));

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $server_output = curl_exec($ch);
    return $server_output;

    curl_close($ch);

}
echo web_scrape($url);

?> …
Run Code Online (Sandbox Code Playgroud)

html php curl recaptcha web-scraping

1
推荐指数
1
解决办法
2425
查看次数

标签 统计

curl ×1

html ×1

php ×1

recaptcha ×1

web-scraping ×1