我在哪里可以找到有关Nutch状态代码的文档?

Leo*_*ers 3 nutch

Nutch有几个状态代码,用于对已爬网文档进行分类.

Nutch使用的代码示例如下:

db_unfetched
db_fetched
db_gone
db_redir_perm
db_redir_temp
db_notmodified
Run Code Online (Sandbox Code Playgroud)

我在哪里可以找到清楚的解释代码的含义?

在Stackoverflow上阅读论坛帖子和回答者可以很好地理解代码.此页面提供了一些很好的输入:http://wiki.apache.org/nutch/CrawlDatumStates但我正在寻找一个描述每个状态代码含义的页面.

Tej*_*til 5

没有官方文档,但我可以从CrawlDatum类中提取这个:

  /** Page was not fetched yet. */
  public static final byte STATUS_DB_UNFETCHED      = 0x01;

  /** Page was successfully fetched. */
  public static final byte STATUS_DB_FETCHED        = 0x02;

  /** Page no longer exists. */
  public static final byte STATUS_DB_GONE           = 0x03;

  /** Page temporarily redirects to other page. */
  public static final byte STATUS_DB_REDIR_TEMP     = 0x04;

  /** Page permanently redirects to other page. */
  public static final byte STATUS_DB_REDIR_PERM     = 0x05;

  /** Page was successfully fetched and found not modified. */
  public static final byte STATUS_DB_NOTMODIFIED    = 0x06;
Run Code Online (Sandbox Code Playgroud)