小编All*_*lly的帖子

使用 goutte 从文件/字符串中读取

我正在使用 Goutte 制作网络爬虫。

为了进行开发，我保存了一个我想要遍历的 .html 文档（因此我不会不断向网站发出请求）。这是我到目前为止所拥有的：

use Goutte\Client;

$client = new Client();
$html=file_get_contents('test.html');
$crawler = $client->request(null,null,[],[],[],$html);

Run Code Online (Sandbox Code Playgroud)

据我所知，应该在 Symfony\Component\BrowserKit 中调用请求，并传入原始正文数据。这是我收到的错误消息：

PHP Fatal error:  Uncaught exception 'GuzzleHttp\Exception\ConnectException' with message 'cURL error 7: Failed to connect to localhost port 80: Connection refused (see http://curl.haxx.se/libcurl/c/libcurl-errors.html)' in C:\Users\Ally\Sites\scrape\vendor\guzzlehttp\guzzle\src\Handler\CurlFactory.

Run Code Online (Sandbox Code Playgroud)

如果我只使用 DomCrawler，那么使用字符串创建爬虫并不简单。（参见： http: //symfony.com/doc/current/components/dom_crawler.html）。我只是不确定如何用 Goutte 做同样的事情。

提前致谢。

php web-scraping symfony goutte

All*_*lly

lucky-day

5
推荐指数

1
解决办法

5307
查看次数