use*_*615 0 performance amazon-web-services cassandra
我们的一名DBA使用相同的Python代码(如下)在AWS EC2上将Cassandra对Oracle的INSERT性能(1M条记录)进行了基准测试,并获得了以下令人惊讶的结果:
Oracle 12.2,单节点,64cores / 256GB,EC2 EBS存储,38秒
Cassandra 5.1.13(DDAC),单节点,2cores / 4GB,EC2 EBS存储, 464秒
Cassandra 3.11.4,四个节点,16cores / 64GB(每个节点),EC2 EBS存储,486秒
所以- 我们在做什么错?
Cassandra的表现为何如此缓慢?
*没有足够的节点?(为什么4个节点比单节点要慢?)
*配置问题?
*还有吗?
谢谢!
以下是Python代码:
import logging
import time
from cassandra import ConsistencyLevel
from cassandra.cluster import Cluster, BatchStatement
from cassandra.query import SimpleStatement
from cassandra.auth import PlainTextAuthProvider
class PythonCassandraExample:
def __init__(self):
self.cluster = None
self.session = None
self.keyspace = None
self.log = None
def __del__(self):
self.cluster.shutdown()
def createsession(self):
auth_provider = PlainTextAuthProvider(username='cassandra', password='cassandra')
self.cluster = Cluster(['10.220.151.138'],auth_provider = auth_provider)
self.session = self.cluster.connect(self.keyspace)
def getsession(self):
return self.session
# How about Adding some log info to see what went wrong
def setlogger(self):
log = logging.getLogger()
log.setLevel('INFO')
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
log.addHandler(handler)
self.log = log
# Create Keyspace based on Given Name
def createkeyspace(self, keyspace):
"""
:param keyspace: The Name of Keyspace to be created
:return:
"""
# Before we create new lets check if exiting keyspace; we will drop that and create new
rows = self.session.execute("SELECT keyspace_name FROM system_schema.keyspaces")
if keyspace in [row[0] for row in rows]:
self.log.info("dropping existing keyspace...")
self.session.execute("DROP KEYSPACE " + keyspace)
self.log.info("creating keyspace...")
self.session.execute("""
CREATE KEYSPACE %s
WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }
""" % keyspace)
self.log.info("setting keyspace...")
self.session.set_keyspace(keyspace)
def create_table(self):
c_sql = """
CREATE TABLE IF NOT EXISTS employee (emp_id int PRIMARY KEY,
ename varchar,
sal double,
city varchar);
"""
self.session.execute(c_sql)
self.log.info("Employee Table Created !!!")
# lets do some batch insert
def insert_data(self):
i = 1
while i < 1000000:
insert_sql = self.session.prepare("INSERT INTO employee (emp_id, ename , sal,city) VALUES (?,?,?,?)")
batch = BatchStatement()
batch.add(insert_sql, (i, 'Danny', 2555, 'De-vito'))
self.session.execute(batch)
# self.log.info('Batch Insert Completed for ' + str(i))
i += 1
# def select_data(self):
# rows = self.session.execute('select count(*) from perftest.employee limit 5;')
# for row in rows:
# print(row.ename, row.sal)
def update_data(self):
pass
def delete_data(self):
pass
if __name__ == '__main__':
example1 = PythonCassandraExample()
example1.createsession()
example1.setlogger()
example1.createkeyspace('perftest')
example1.create_table()
# Populate perftest.employee table
start = time.time()
example1.insert_data()
end = time.time()
print ('Duration: ' + str(end-start) + ' sec.')
# example1.select_data()
Run Code Online (Sandbox Code Playgroud)
这里有多个问题:
insert_sql = self.session.prepare(移出循环即可;PS实际的负载测试非常困难。有专门的工具可以用于此目的,例如,您可以在此博客文章中找到更多信息。
| 归档时间: |
|
| 查看次数: |
127 次 |
| 最近记录: |