小编Roh*_*han的帖子

Nutch不会使用查询字符串参数来抓取网址

我正在使用Nutch1.9并尝试使用单个命令进行爬网.当进入第二级生成器返回0记录时可以看出输出.任何人都遇到过这个问题?我被困在这里过去2天.搜索了所有可能的选项.任何线索/帮助将非常感激.

<br>#######  INJECT   ######<br>
Injector: starting at 2015-04-08 17:36:20 <br>
Injector: crawlDb: crawl/crawldb<br>
Injector: urlDir: urls<br>
Injector: Converting injected urls to crawl db entries.<br>
Injector: overwrite: false<br>
Injector: update: false<br>
Injector: Total number of urls rejected by filters: 0<br>
Injector: Total number of urls after normalization: 1<br>
Injector: Total new urls injected: 1<br>
Injector: finished at 2015-04-08 17:36:21, elapsed: 00:00:01<br>
####  GENERATE  ###<br>
Generator: starting at 2015-04-08 17:36:22<br>
Generator: Selecting best-scoring urls due for fetch.<br>
Generator: filtering: true<br>
Generator: normalizing: …
Run Code Online (Sandbox Code Playgroud)

java web-crawler nutch

2
推荐指数
1
解决办法
689
查看次数

标签 统计

java ×1

nutch ×1

web-crawler ×1