如何使用wget和获取网站上的所有文件?
我需要除HTML,PHP,ASP等网页文件以外的所有文件.
I'd like to use python to scrape google scholar search results. I found two different script to do that, one is gscholar.py and the other is scholar.py (can that one be used as a python library?).
Now, I should maybe say that I'm totally new to python, so sorry if I miss the obvious!
The problem is when I use gscholar.py as explained in the README file, I get as a result
query() takes at least 2 arguments (1 given) …
下一句话引起了我对Wget手册的注意
wget --spider --force-html -i bookmarks.html
This feature needs much more work for Wget to get close to the functionality of real web spiders.
Run Code Online (Sandbox Code Playgroud)
我在wget中找到了与蜘蛛选项相关的以下代码行.
src/ftp.c
780: /* If we're in spider mode, don't really retrieve anything. The
784: if (opt.spider)
889: if (!(cmd & (DO_LIST | DO_RETR)) || (opt.spider && !(cmd & DO_LIST)))
1227: if (!opt.spider)
1239: if (!opt.spider)
1268: else if (!opt.spider)
1827: if (opt.htmlify && !opt.spider)
src/http.c
64:#include "spider.h"
2405: /* Skip preliminary HEAD request if we're …Run Code Online (Sandbox Code Playgroud)