PyMongo：如何使用聚合更新集合？

Question

PyMongo：如何使用聚合更新集合？

这是这个问题的延续。

我正在使用以下代码从集合中查找C_a文本包含单词的所有文档StackOverflow，并将它们存储在另一个名为的集合中C_b：

import pymongo
from pymongo import MongoClient
client = MongoClient('127.0.0.1')  # mongodb running locally
dbRead = client['C_a']            # using the test database in mongo
# create the pipeline required 
pipeline = [{"$match": {"$text": {"$search":"StackOverflow"}}},{"$out":"C_b"}]  # all attribute and operator need to quoted in pymongo
dbRead.C_a.aggregate(pipeline)  #execution 
print (dbRead.C_b.count()) ## verify count of the new collection

Run Code Online (Sandbox Code Playgroud)

这很有效，但是，如果我为多个关键字运行相同的代码段，结果会被覆盖。例如，我想收集C_b包含包含关键字的所有文件StackOverflow，StackExchange以及Programming。为此，我只需使用上述关键字迭代代码段。但不幸的是，每次迭代都会覆盖前一次。

问题：如何更新输出集合而不是覆盖它？

另外：是否有避免重复的聪明方法，或者我必须在事后检查重复吗？

Answer 1

Tar*_*ani 2

如果查看文档$out不支持更新

https://docs.mongodb.com/manual/reference/operator/aggregation/out/#pipe._S_out

所以你需要进行两阶段操作

pipeline = [{"$match": {"$text": {"$search":"StackOverflow"}}},{"$out":"temp"}]  # all attribute and operator need to quoted in pymongo
dbRead.C_a.aggregate(pipeline)

Run Code Online (Sandbox Code Playgroud)

然后使用中讨论的方法

/sf/answers/2620354831/

dbRead.C_b.insert(
   dbRead.temp.aggregate([]).toArray()
)

Run Code Online (Sandbox Code Playgroud)

在开始运行之前，您需要删除C_b集合

归档时间：	8 年前
查看次数：	802 次
最近记录：	7 年，11 月前