如何将SQL查询结果转换为PANDAS数据结构?

use*_*017 97 python mysql data-structures pandas

任何有关此问题的帮助将不胜感激.

所以基本上我想对我的SQL数据库运行查询并将返回的数据存储为Pandas数据结构.

我附加了查询代码.

我正在阅读关于Pandas的文档,但是我有问题确定我的查询的返回类型.

我试图打印查询结果,但它没有提供任何有用的信息.

谢谢!!!!

from sqlalchemy import create_engine

engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING')
connection2 = engine2.connect()
dataid = 1022
resoverall = connection2.execute("
  SELECT 
      sum(BLABLA) AS BLA,
      sum(BLABLABLA2) AS BLABLABLA2,
      sum(SOME_INT) AS SOME_INT,
      sum(SOME_INT2) AS SOME_INT2,
      100*sum(SOME_INT2)/sum(SOME_INT) AS ctr,
      sum(SOME_INT2)/sum(SOME_INT) AS cpc
   FROM daily_report_cooked
   WHERE campaign_id = '%s'", %dataid)
Run Code Online (Sandbox Code Playgroud)

所以我想知道我的变量"resoverall"的格式/数据类型是什么,以及如何使用PANDAS数据结构.

bea*_*rdc 118

编辑:2015年3月

如下所述,pandas现在使用SQLAlchemy来读取(read_sql)和插入(to_sql)数据库.以下应该有效

import pandas as pd

df = pd.read_sql(sql, cnxn)
Run Code Online (Sandbox Code Playgroud)

上一个答案: 来自类似问题的 mikebmassey

import pyodbc
import pandas.io.sql as psql

cnxn = pyodbc.connect(connection_info) 
cursor = cnxn.cursor()
sql = "SELECT * FROM TABLE"

df = psql.frame_query(sql, cnxn)
cnxn.close()
Run Code Online (Sandbox Code Playgroud)


Dan*_*kov 100

这是完成这项工作的最短代码:

from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()
Run Code Online (Sandbox Code Playgroud)

你可以像保罗的回答一样更好地解析这些类型.

  • df = DataFrame(cursor.fetchall())返回ValueError:DataFrame构造函数未正确调用!,看来元组的元组对于DataFrame构造函数是不可接受的。在字典或元组模式下,光标上也没有`.keys()`。 (4认同)
  • 请注意,keys 方法仅适用于使用 sqlalchemy 获得的结果。Pyodbc 使用列的描述属性。 (4认同)
  • 这对我从 Oracle 数据库中提取的 1.000.000 条记录有效。 (3认同)
  • @BowenLiu是的,您可以与 psycopg2 `df.columns=[ x.name for x in recoveryall.description ]` 一起使用 (3认同)
  • 为什么“df.columns = resoverall.keys()”不起作用? (2认同)

Nat*_*uld 33

如果您使用的是SQLAlchemy的ORM而不是表达式语言,您可能会发现自己想要将类型的对象转换sqlalchemy.orm.query.Query为Pandas数据框.

最干净的方法是从查询的语句属性中获取生成的SQL,然后使用pandas的read_sql()方法执行它.例如,从名为的Query对象开始query:

df = pd.read_sql(query.statement, query.session.bind)
Run Code Online (Sandbox Code Playgroud)

  • 更有效的方法是从sqlalchemy获取语句,让pandas使用`pandas.read_sql_query`进行查询,并将`query.statement`传递给它.请参阅此答案:http://stackoverflow.com/a/29528804/1273938 (5认同)

Pau*_*l H 23

编辑2014-09-30:

熊猫现在有一个read_sql功能.你肯定想要使用它.

原始答案:

我无法帮助你使用SQLAlchemy - 我总是根据需要使用pyodbc,MySQLdb或psychopg2.但是当这样做时,一个像下面那样简单的功能可以满足我的需求:

import decimal

import pydobc
import numpy as np
import pandas

cnn, cur = myConnectToDBfunction()
cmd = "SELECT * FROM myTable"
cur.execute(cmd)
dataframe = __processCursor(cur, dataframe=True)

def __processCursor(cur, dataframe=False, index=None):
    '''
    Processes a database cursor with data on it into either
    a structured numpy array or a pandas dataframe.

    input:
    cur - a pyodbc cursor that has just received data
    dataframe - bool. if false, a numpy record array is returned
                if true, return a pandas dataframe
    index - list of column(s) to use as index in a pandas dataframe
    '''
    datatypes = []
    colinfo = cur.description
    for col in colinfo:
        if col[1] == unicode:
            datatypes.append((col[0], 'U%d' % col[3]))
        elif col[1] == str:
            datatypes.append((col[0], 'S%d' % col[3]))
        elif col[1] in [float, decimal.Decimal]:
            datatypes.append((col[0], 'f4'))
        elif col[1] == datetime.datetime:
            datatypes.append((col[0], 'O4'))
        elif col[1] == int:
            datatypes.append((col[0], 'i4'))

    data = []
    for row in cur:
        data.append(tuple(row))

    array = np.array(data, dtype=datatypes)
    if dataframe:
        output = pandas.DataFrame.from_records(array)

        if index is not None:
            output = output.set_index(index)

    else:
        output = array

    return output
Run Code Online (Sandbox Code Playgroud)


Tho*_*gdt 15

MySQL连接器

对于那些使用mysql连接器的人,您可以使用此代码作为开始.(感谢@Daniel Velkov)

使用的参考:


import pandas as pd
import mysql.connector

# Setup MySQL connection
db = mysql.connector.connect(
    host="<IP>",              # your host, usually localhost
    user="<USER>",            # your username
    password="<PASS>",        # your password
    database="<DATABASE>"     # name of the data base
)   

# You must create a Cursor object. It will let you execute all the queries you need
cur = db.cursor()

# Use all the SQL you like
cur.execute("SELECT * FROM <TABLE>")

# Put it all to a data frame
sql_data = pd.DataFrame(cur.fetchall())
sql_data.columns = cur.column_names

# Close the session
db.close()

# Show the data
print(sql_data.head())
Run Code Online (Sandbox Code Playgroud)


Lin*_*esa 14

1. 使用 MySQL-connector-python

# pip install mysql-connector-python

import mysql.connector
import pandas as pd

mydb = mysql.connector.connect(
    host = 'host',
    user = 'username',
    passwd = 'pass',
    database = 'db_name'
)
query = 'select * from table_name'
df = pd.read_sql(query, con = mydb)
print(df)
Run Code Online (Sandbox Code Playgroud)

2. 使用 SQLAlchemy

# pip install pymysql
# pip install sqlalchemy

import pandas as pd
import sqlalchemy

engine = sqlalchemy.create_engine('mysql+pymysql://username:password@localhost:3306/db_name')

query = '''
select * from table_name
'''
df = pd.read_sql_query(query, engine)
print(df)
Run Code Online (Sandbox Code Playgroud)


Mur*_*ala 9

这是我使用的代码.希望这可以帮助.

import pandas as pd
from sqlalchemy import create_engine

def getData():
  # Parameters
  ServerName = "my_server"
  Database = "my_db"
  UserPwd = "user:pwd"
  Driver = "driver=SQL Server Native Client 11.0"

  # Create the connection
  engine = create_engine('mssql+pyodbc://' + UserPwd + '@' + ServerName + '/' + Database + "?" + Driver)

  sql = "select * from mytable"
  df = pd.read_sql(sql, engine)
  return df

df2 = getData()
print(df2)
Run Code Online (Sandbox Code Playgroud)


Des*_*ngh 6

这是对您的问题的简短回答:

from __future__ import print_function
import MySQLdb
import numpy as np
import pandas as pd
import xlrd

# Connecting to MySQL Database
connection = MySQLdb.connect(
             host="hostname",
             port=0000,
             user="userID",
             passwd="password",
             db="table_documents",
             charset='utf8'
           )
print(connection)
#getting data from database into a dataframe
sql_for_df = 'select * from tabledata'
df_from_database = pd.read_sql(sql_for_df , connection)
Run Code Online (Sandbox Code Playgroud)


Jan*_*yer 5

和 Nathan 一样,我经常想将 sqlalchemy 或 sqlsoup 查询的结果转储到 Pandas 数据框中。我自己的解决方案是:

query = session.query(tbl.Field1, tbl.Field2)
DataFrame(query.all(), columns=[column['name'] for column in query.column_descriptions])
Run Code Online (Sandbox Code Playgroud)