Gau*_*rav 1 google-app-engine web-crawler google-cloud-datastore
我正在Google App Engine上构建一个Web爬虫.要将已爬网的信息存储在Data Store中,我使用JDO使用以下字段.守则如下:
public class LinkInfo
{
@PrimaryKey
@Persistent private String id;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent private int linkNo;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent private String link;
@Persistent private int version;
@Persistent private String fetchDate;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent private long fetchTime;
@Persistent private String nextFetch;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent private String pageCreationDate;
@Persistent private int retries;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent private int retryInterval;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent private int outLinks;
@Persistent private float score;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent private String abstractContent;
@Persistent private String contentType;
@Persistent private String parent;
@Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
@Persistent private String title;
...
Run Code Online (Sandbox Code Playgroud)
在16个字段中,我做了8个未索引的,因为我不需要对它们进行过滤或排序.即使是现在,我超出了数据存储区写操作限制.
有关"数据存储区写操作"减少的建议吗?
来自Google App Engine:
对于每个新的实体投入:
"每个索引属性值写入2次+ 2次写入,每个复合索引值写入1次".
因此,对于每个实体,您将拥有2 + 2*8 +(无论您拥有多少个自定义索引).
每个实体至少18个.
减少写入次数的最佳方法是减少索引属性的数量.
| 归档时间: |
|
| 查看次数: |
1413 次 |
| 最近记录: |