fas*_*uto 8 python mongodb cassandra nosql
我必须为具有大量插入(1M /天)的项目选择Cassandra或MongoDB(或另一个nosql数据库,我接受建议).所以我创建了一个小测试来测量写入性能.这是在Cassandra中插入的代码:
import time
import os
import random
import string
import pycassa
def get_random_string(string_length):
    return ''.join(random.choice(string.letters) for i in xrange(string_length))
def connect():
    """Connect to a test database"""
    connection = pycassa.connect('test_keyspace', ['localhost:9160'])
    db = pycassa.ColumnFamily(connection,'foo')
    return db
def random_insert(db):
    """Insert a record into the database. The record has the following format
    ID timestamp
    4 random strings
    3 random integers"""
    record = {}
    record['id'] = str(time.time())
    record['str1'] = get_random_string(64)
    record['str2'] = get_random_string(64)
    record['str3'] = get_random_string(64)
    record['str4'] = get_random_string(64)
    record['num1'] = str(random.randint(0, 100))
    record['num2'] = str(random.randint(0, 1000))
    record['num3'] = str(random.randint(0, 10000))
    db.insert(str(time.time()), record)
if __name__ == "__main__":
    db = connect()
    start_time = time.time()
    for i in range(1000000):
        random_insert(db)
    end_time = time.time()
    print "Insert time: %lf " %(end_time - start_time)
并且在Mongo中插入的代码改变连接函数是一样的:
def connect():
    """Connect to a test database"""
    connection = pymongo.Connection('localhost', 27017)
    db = connection.test_insert
    return db.foo2
插入Cassandra的结果约为1046秒,而Mongo的结果约为437秒.据说Cassandra比Mongo插入数据要快得多.那么,我做错了什么?
jbe*_*lis 12
没有相当于Cassandra中Mongo的不安全模式.(我们曾经有一个,但我们把它拿出来,因为它只是一个坏主意.)
另一个主要问题是你正在进行单线程插入.Cassandra专为高并发性而设计; 你需要使用多线程测试.请参阅http://spyced.blogspot.com/2010/01/cassandra-05.html底部的图表(实际数字已超过一年,但原则仍然正确).
Cassandra源分布在contrib/stress中包含了这样的测试.
| 归档时间: | 
 | 
| 查看次数: | 3732 次 | 
| 最近记录: |