减少数据存储区写操作

Gau*_*rav 1 google-app-engine web-crawler google-cloud-datastore

我正在Google App Engine上构建一个Web爬虫.要将已爬网的信息存储在Data Store中,我使用JDO使用以下字段.守则如下:

public class LinkInfo
{
   @PrimaryKey
   @Persistent private String id;

   @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
   @Persistent private int linkNo;

   @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
   @Persistent private String link;

   @Persistent private int version;

   @Persistent private String fetchDate;

   @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
   @Persistent private long fetchTime;

   @Persistent private String nextFetch;

   @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
   @Persistent private String pageCreationDate;

   @Persistent private int retries;

   @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
   @Persistent private int retryInterval;

   @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
   @Persistent private int outLinks;

   @Persistent private float score;

   @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
   @Persistent private String abstractContent;

   @Persistent private String contentType;

   @Persistent private String parent;

   @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
   @Persistent private String title;

       ...
Run Code Online (Sandbox Code Playgroud)

在16个字段中,我做了8个未索引的,因为我不需要对它们进行过滤或排序.即使是现在,我超出了数据存储区写操作限制.

有关"数据存储区写操作"减少的建议吗?

use*_*604 6

来自Google App Engine:

对于每个新的实体投入:

"每个索引属性值写入2次+ 2次写入,每个复合索引值写入1次".

因此,对于每个实体,您将拥有2 + 2*8 +(无论您拥有多少个自定义索引).

每个实体至少18个.

减少写入次数的最佳方法是减少索引属性的数量.