如何从select语句中批量调用数据并追加到dataframe中？

Question

如何从select语句中批量调用数据并追加到dataframe中？

Rus*_*ord 3 python pyodbc python-3.x pandas

我有一个包含 sql 语句的文件，我正在使用pyodbc. sql 语句只是一个 select 语句，如下所示：

select distinct (columns) from table1

Run Code Online (Sandbox Code Playgroud)

然而我调用的数据是 3000 万行。

我可以对较小的表执行此操作，并将信息放入数据框中。

无论如何，是否可以批处理 select 语句以仅提取 X 行并附加到数据帧中，并继续执行此操作，直到 3000 万条记录结束？

到目前为止的代码：

import os.path
import pandas as pd
import tinys3
import psycopg2
import pyodbc
from datetime import datetime
import uuid
import glob
from os import listdir
from os.path import isfile, join
import time

startTime = datetime.now()

#reading in data for db
server = 'xxxx' 
database = 'xxx' 
username = 'xxx' 
password = 'xxxx' 
driver= '{ODBC Driver 17 for SQL Server}'
cnxn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=xxx;DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
path = "path/to/folder"




for infile in glob.glob( os.path.join(path, '*.sql') ):
    with open(infile, 'r') as myfile:
        sql = myfile.read()
        print(sql)
        myfile.close()
        cursor.execute(sql)

        row = cursor.fetchall()
        columns = [column[0] for column in cursor.description]
        columns = [element.lower() for element in columns]

        df = pd.DataFrame([tuple(t) for t in row])
        df.columns = columns

Run Code Online (Sandbox Code Playgroud)

Answer 1

Nan*_*ish 7

您可以使用fetchmany 函数：

Cursor.fetchmany([size=cursor.arraysize]) --> 列表

返回剩余行的列表，包含不超过 size 行，用于处理块中的结果。当没有更多行时，列表将为空。

cursor.arraysize 的默认值为 1，这与调用 fetchone() 没有什么不同。

如果没有执行任何 SQL 或者它没有返回结果集（例如不是 SELECT 语句），则会引发 PlanningError 异常。

这将允许您分块提取数据。

使用示例：

while True:
    three_rows = cursor.fetchmany(3)
    # every loop cycle, 3 rows are selected
    if not results:
        break;
    print(three_rows)

Run Code Online (Sandbox Code Playgroud)

您还可以使用 fetchdone 函数逐行处理数据。

获取电话

cursor.fetchone() --> 行或无

当没有更多数据可用时，返回下一行或“无”。

归档时间：	7 年，9 月前
查看次数：	6652 次
最近记录：	7 年，9 月前