当使用pyodbc从SQL Server数据库加载超过1000万条记录时,Pandas变得非常慢,主要是函数pandas.read_sql(query,pyodbc_conn).以下代码最多需要40-45分钟才能从SQL表中加载10-15百万条记录:Table1
是否有更好更快的方法将SQL表读入pandas Dataframe?
import pyodbc
import pandas
server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = conn.cursor()
data = pandas.read_sql("select * from Table1", conn) #Takes about 40-45 minutes to complete
Run Code Online (Sandbox Code Playgroud) 有没有更快的方法将 pyodbc.rows 对象转换为 pandas Dataframe?将超过 1000 万个 pyodbc.rows 对象的列表转换为 pandas 数据框大约需要 30-40 分钟。
import pyodbc
import pandas
server = <server_ip>
database = <db_name>
username = <db_user>
password = <password>
port='1443'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';PORT='+port+';DATABASE='+database+';UID='+username+';PWD='+ password)
#takes upto 12 minutes
rows = cursor.execute("select top 10000000 * from [LSLTGT].[MBR_DIM] ").fetchall()
#Read cursor data into Pandas dataframe.....Takes forever!
df = pandas.DataFrame([tuple(t) for t in rows])
Run Code Online (Sandbox Code Playgroud) When I try to run:
[root@pex appliance_ui]# curl https://bootstrap.pypa.io./get-pip.py | python
Run Code Online (Sandbox Code Playgroud)
It returns:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1603k 100 1603k 0 0 7006k 0 --:--:-- --:--:-- --:--:-- 13.2M
Traceback (most recent call last):
File "<stdin>", line 20649, in <module>
File "<stdin>", line 197, in main
File "<stdin>", line 82, in bootstrap
File "/tmp/tmpH39pcu/pip.zip/pip/_internal/__init__.py", line 42, in <module>
File "/tmp/tmpH39pcu/pip.zip/pip/_internal/cmdoptions.py", line 16, in <module>
File "/tmp/tmpH39pcu/pip.zip/pip/_internal/index.py", …Run Code Online (Sandbox Code Playgroud) 我正在处理一个包含超过 2000 万条记录的庞大数据集。我正在尝试将所有数据保存为羽毛格式,以便更快地访问,并在进行分析时进行附加。
有没有办法将 pandas 数据框附加到现有的羽毛格式文件中?
pandas ×3
python ×3
pyodbc ×2
centos6 ×1
curl ×1
feather ×1
pip ×1
python-2.6 ×1
python-3.5 ×1
sql-server ×1