Cassandra在AWS上的性能下降

use*_*615 0 performance amazon-web-services cassandra

我们的一名DBA使用相同的Python代码(如下)在AWS EC2上将Cassandra对Oracle的INSERT性能(1M条记录)进行了基准测试,并获得了以下令人惊讶的结果:

Oracle 12.2,单节点,64cores / 256GB,EC2 EBS存储,38秒

Cassandra 5.1.13(DDAC),单节点,2cores / 4GB,EC2 EBS存储, 464秒

Cassandra 3.11.4,四个节点,16cores / 64GB(每个节点),EC2 EBS存储,486秒

所以- 我们在做什么错?
Cassandra的表现为何如此缓慢?
*没有足够的节点?(为什么4个节点比单节点要慢?)
*配置问题?
*还有吗?

谢谢!

以下是Python代码:

import logging
import time
from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster, BatchStatement
from cassandra.query import SimpleStatement
from cassandra.auth import PlainTextAuthProvider

class PythonCassandraExample:

    def __init__(self):
        self.cluster = None
        self.session = None
        self.keyspace = None
        self.log = None

    def __del__(self):
        self.cluster.shutdown()

    def createsession(self):
        auth_provider = PlainTextAuthProvider(username='cassandra', password='cassandra')
        self.cluster = Cluster(['10.220.151.138'],auth_provider = auth_provider)
        self.session = self.cluster.connect(self.keyspace)

    def getsession(self):
        return self.session

    # How about Adding some log info to see what went wrong
    def setlogger(self):
        log = logging.getLogger()
        log.setLevel('INFO')
        handler = logging.StreamHandler()
        handler.setFormatter(logging.Formatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
        log.addHandler(handler)
        self.log = log

    # Create Keyspace based on Given Name
    def createkeyspace(self, keyspace):
        """
        :param keyspace:  The Name of Keyspace to be created
        :return:
        """
        # Before we create new lets check if exiting keyspace; we will drop that and create new
        rows = self.session.execute("SELECT keyspace_name FROM system_schema.keyspaces")
        if keyspace in [row[0] for row in rows]:
            self.log.info("dropping existing keyspace...")
            self.session.execute("DROP KEYSPACE " + keyspace)

        self.log.info("creating keyspace...")
        self.session.execute("""
                CREATE KEYSPACE %s
                WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }
                """ % keyspace)

        self.log.info("setting keyspace...")
        self.session.set_keyspace(keyspace)

    def create_table(self):
        c_sql = """
                CREATE TABLE IF NOT EXISTS employee (emp_id int PRIMARY KEY,
                                              ename varchar,
                                              sal double,
                                              city varchar);
                 """
        self.session.execute(c_sql)
        self.log.info("Employee Table Created !!!")

    # lets do some batch insert
    def insert_data(self):
        i = 1
        while i < 1000000:
          insert_sql = self.session.prepare("INSERT INTO  employee (emp_id, ename , sal,city) VALUES (?,?,?,?)")
          batch = BatchStatement()
          batch.add(insert_sql, (i, 'Danny', 2555, 'De-vito'))
          self.session.execute(batch)
          # self.log.info('Batch Insert Completed for ' + str(i))
          i += 1

    # def select_data(self):
    #    rows = self.session.execute('select count(*) from perftest.employee limit 5;')
    #    for row in rows:
    #        print(row.ename, row.sal)

    def update_data(self):
        pass

    def delete_data(self):
        pass


if __name__ == '__main__':
    example1 = PythonCassandraExample()
    example1.createsession()
    example1.setlogger()
    example1.createkeyspace('perftest')
    example1.create_table()

    # Populate perftest.employee table
    start = time.time()
    example1.insert_data()
    end = time.time()
    print ('Duration: ' + str(end-start) + ' sec.')

    # example1.select_data()
Run Code Online (Sandbox Code Playgroud)

Ale*_*Ott 5

这里有多个问题:

  • 对于第二次测试,您没有为DDAC分配足够的内存和内核,因此Cassandra仅获得1Gb堆-默认情况下,Cassandra占用所有可用内存的1/4。第三项测试也是如此-堆只会得到16Gb RAM,您可能需要将其提高到更高的值,例如24Gb甚至更高。
  • 尚不清楚每次测试中有多少个IOP-EBS的吞吐量取决于卷的大小及其类型
  • 您正在使用同步API执行命令-基本上,您在确认已插入前一个项目后才插入下一个项目。使用异步API可以实现最佳吞吐量;
  • 您正在每次迭代中准备语句-这导致每次将CQL字符串发送到服务器,因此这会减慢所有操作-只需将行insert_sql = self.session.prepare(移出循环即可;
  • (不完全相关)您正在使用批处理语句写入数据-这是Cassandra中反模式,因为数据仅发送到一个节点,然后应将数据分发到真正拥有该数据的节点。这解释了为什么4个节点的群集比1个节点的群集差。

PS实际的负载测试非常困难。有专门的工具可以用于此目的,例如,您可以在此博客文章中找到更多信息。