Bri*_*man 44 postgresql performance psycopg2 jdbc
我正在尝试提高SQLAlchemy数据库查询的性能.我们正在使用psycopg2.在我们的生产系统中,我们选择使用Java,因为它速度提高了至少50%,即使不是接近100%.所以我希望Stack Overflow社区中的某个人能够提高我的表现.
我认为我的下一步将是最终修补psycopg2库,使其行为类似于JDBC驱动程序.如果是这种情况并且有人已经这样做了,那就没问题,但我希望我仍然可以通过Python进行设置或重构调整.
我有一个简单的"SELECT*FROM someLargeDataSetTable"查询运行.数据集的大小为GB.快速表现图如下:
Records | JDBC | SQLAlchemy[1] | SQLAlchemy[2] | Psql
--------------------------------------------------------------------
1 (4kB) | 200ms | 300ms | 250ms | 10ms
10 (8kB) | 200ms | 300ms | 250ms | 10ms
100 (88kB) | 200ms | 300ms | 250ms | 10ms
1,000 (600kB) | 300ms | 300ms | 370ms | 100ms
10,000 (6MB) | 800ms | 830ms | 730ms | 850ms
100,000 (50MB) | 4s | 5s | 4.6s | 8s
1,000,000 (510MB) | 30s | 50s | 50s | 1m32s
10,000,000 (5.1GB) | 4m44s | 7m55s | 6m39s | n/a
--------------------------------------------------------------------
5,000,000 (2.6GB) | 2m30s | 4m45s | 3m52s | 14m22s
--------------------------------------------------------------------
[1] - With the processrow function
[2] - Without the processrow function (direct dump)
我可以添加更多(我们的数据可以多达太字节),但我认为从数据中可以看出改变斜率.随着数据集大小的增加,JDBC的表现会更好.一些笔记......
python -u
#!/usr/bin/env python
# testSqlAlchemy.py
import sys
try:
import cdecimal
sys.modules["decimal"]=cdecimal
except ImportError,e:
print >> sys.stderr, "Error: cdecimal didn't load properly."
raise SystemExit
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
def processrow (row,delimiter="|",null="\N"):
newrow = []
for x in row:
if x is None:
x = null
newrow.append(str(x))
return delimiter.join(newrow)
fetchsize = 10000
connectionString = "postgresql+psycopg2://usr:pass@server:port/db"
eng = create_engine(connectionString, server_side_cursors=True)
session = sessionmaker(bind=eng)()
with open("test.sql","r") as queryFD:
with open("/dev/null","w") as nullDev:
query = session.execute(queryFD.read())
cur = query.cursor
while cur.statusmessage not in ['FETCH 0','CLOSE CURSOR']:
for row in query.fetchmany(fetchsize):
print >> nullDev, processrow(row)
在计时之后,我还运行了一个cProfile,这是最严重罪犯的转储:
Fri Mar 4 13:49:45 2011 sqlAlchemy.prof
415757706 function calls (415756424 primitive calls) in 563.923 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 563.924 563.924 {execfile}
1 25.151 25.151 563.924 563.924 testSqlAlchemy.py:2()
1001 0.050 0.000 329.285 0.329 base.py:2679(fetchmany)
1001 5.503 0.005 314.665 0.314 base.py:2804(_fetchmany_impl)
10000003 4.328 0.000 307.843 0.000 base.py:2795(_fetchone_impl)
10011 0.309 0.000 302.743 0.030 base.py:2790(__buffer_rows)
10011 233.620 0.023 302.425 0.030 {method 'fetchmany' of 'psycopg2._psycopg.cursor' objects}
10000000 145.459 0.000 209.147 0.000 testSqlAlchemy.py:13(processrow)
Fri Mar 4 14:03:06 2011 sqlAlchemy.prof
305460312 function calls (305459030 primitive calls) in 536.368 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.001 0.001 536.370 536.370 {execfile}
1 29.503 29.503 536.369 536.369 testSqlAlchemy.py:2()
1001 0.066 0.000 333.806 0.333 base.py:2679(fetchmany)
1001 5.444 0.005 318.462 0.318 base.py:2804(_fetchmany_impl)
10000003 4.389 0.000 311.647 0.000 base.py:2795(_fetchone_impl)
10011 0.339 0.000 306.452 0.031 base.py:2790(__buffer_rows)
10011 235.664 0.024 306.102 0.031 {method 'fetchmany' of 'psycopg2._psycopg.cursor' objects}
10000000 32.904 0.000 172.802 0.000 base.py:2246(__repr__)
不幸的是,除非在SQLAlchemy中有一种方法指定输出的null ='userDefinedValueOrString'和delimiter ='userDefinedValueOrString',否则processrow函数需要保持不变.我们目前使用的Java已经这样做了,所以比较(与processrow)需要苹果对苹果.如果有办法用纯Python或设置调整来提高processrow或SQLAlchemy的性能,我会非常感兴趣.
| 归档时间: |
|
| 查看次数: |
6801 次 |
| 最近记录: |