卡桑德拉表现不佳?

fas*_*uto 8 python mongodb cassandra nosql

我必须为具有大量插入(1M /天)的项目选择Cassandra或MongoDB(或另一个nosql数据库,我接受建议).所以我创建了一个小测试来测量写入性能.这是在Cassandra中插入的代码:

import time
import os
import random
import string
import pycassa

def get_random_string(string_length):
    return ''.join(random.choice(string.letters) for i in xrange(string_length))

def connect():
    """Connect to a test database"""
    connection = pycassa.connect('test_keyspace', ['localhost:9160'])
    db = pycassa.ColumnFamily(connection,'foo')
    return db

def random_insert(db):
    """Insert a record into the database. The record has the following format
    ID timestamp
    4 random strings
    3 random integers"""
    record = {}
    record['id'] = str(time.time())
    record['str1'] = get_random_string(64)
    record['str2'] = get_random_string(64)
    record['str3'] = get_random_string(64)
    record['str4'] = get_random_string(64)
    record['num1'] = str(random.randint(0, 100))
    record['num2'] = str(random.randint(0, 1000))
    record['num3'] = str(random.randint(0, 10000))
    db.insert(str(time.time()), record)

if __name__ == "__main__":
    db = connect()
    start_time = time.time()
    for i in range(1000000):
        random_insert(db)
    end_time = time.time()
    print "Insert time: %lf " %(end_time - start_time)
Run Code Online (Sandbox Code Playgroud)

并且在Mongo中插入的代码改变连接函数是一样的:

def connect():
    """Connect to a test database"""
    connection = pymongo.Connection('localhost', 27017)
    db = connection.test_insert
    return db.foo2
Run Code Online (Sandbox Code Playgroud)

插入Cassandra的结果约为1046秒,而Mongo的结果约为437秒.据说Cassandra比Mongo插入数据要快得多.那么,我做错了什么?

jbe*_*lis 12

没有相当于Cassandra中Mongo的不安全模式.(我们曾经有一个,但我们把它拿出来,因为它只是一个坏主意.)

另一个主要问题是你正在进行单线程插入.Cassandra专为高并发性而设计; 你需要使用多线程测试.请参阅http://spyced.blogspot.com/2010/01/cassandra-05.html底部的图表(实际数字已超过一年,但原则仍然正确).

Cassandra源分布在contrib/stress中包含了这样的测试.