将文档插入MongoDB上的集合时,如何处理文档大小超过16MB的错误

Thr*_*y J 5 mongodb python-2.7

任何人都可以建议将文档插入MongoDB上的集合时,如何处理文档大小超过16MB的错误。我有一些像GridFS这样的解决方案。通过使用GridsFS可以解决此问题,但是我需要不使用GridFS的解决方案。有什么方法可以缩小文档或将其拆分为子文档。如果是,我们如何实现?

from pymongo import MongoClient

conn = MongoClient("mongodb://sample_mongo:27017")
db_conn = conn["test"]
db_collection = db_conn["sample"]

# the size of record is 23MB

record = { \
    "name": "drugs",
    "collection_id": 23,
    "timestamp": 1515065002,
    "tokens": [], # contains list of strings
    "tokens_missing": [], # contains list of strings
    "token_mapping": {} # Dictionary contains transformed tokens
 }

db_collection.insert(record, check_keys=False)
Run Code Online (Sandbox Code Playgroud)

我收到错误DocumentTooLarge:BSON文档太大。在MongoDB中,最大的BSON文档大小为16 MB。

  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 2501, in insert
check_keys, manipulate, write_concern)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 575, in _insert
check_keys, manipulate, write_concern, op_id, bypass_doc_val)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/collection.py", line 556, in _insert_one
check_keys=check_keys)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 482, in command
self._raise_connection_failure(error)
  File "/usr/local/lib/python2.7/dist-packages/pymongo-3.5.1-py2.7-linux-x86_64.egg/pymongo/pool.py", line 610, in _raise_connection_failure
raise error
  DocumentTooLarge: BSON document too large (22451007 bytes) - the connected server supports BSON document sizes up to 16793598 bytes.
Run Code Online (Sandbox Code Playgroud)

Cle*_*ath 2

BSON 文档的最大大小为 16 MB。为了存储大于最大大小的文档,MongoDB 提供了GridFS API

GridFS是用于存储和检索超过 BSON 文档大小限制 16 MB 的文件的规范。GridFS 通过将大尺寸文档划分为部分或块来存储大尺寸文档。每个块都存储在单独的文档中。GridFS 块的默认大小为 255 KB。GridFS 使用两个集合来存储文件。一个集合存储文件块,另一个集合存储文件元数据。