mea*_*ngs 5 python mysql sqlalchemy pandas
我正在开发一个项目,该项目结合了一些基于注册用户的数据源.特别是一个问题给了我很多问题:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
from sqlalchemy import create_engine
# of course, the info here is obscured
prod_engine = create_engine('mysql+mysqlconnector://password@host:3306/database',pool_timeout=3600,pool_recycle=3600)
query_users = """
SELECT users.id,
CASE
WHEN ((users.role = '' OR users.role IS NULL) AND users.plan LIKE 'pro%') OR users.role REGEXP '(pro|agent|manager)' THEN 'professional' ELSE 'consumer'
END AS 'modified_role',
users.created_at,
users.logged_in_at AS 'last_login',
COUNT(DISTINCT(folders.id)) AS 'folder_count',
IF(COUNT(DISTINCT(folders.id)) > 1, '2 or more','0 to 1') AS 'folder_group',
MIN(folders.created_at) AS 'first_folder_created',
MAX(folders.created_at) AS 'last_folder_created'
FROM users
LEFT OUTER JOIN folders
ON folders.created_by = users.id
AND folders.discarded = 0
AND folders.created_at >= '2010-11-30 23:59:59'
WHERE users.invalid_email IS NULL
GROUP BY 1"""
users = pd.read_sql_query(query_users, prod_engine)
Run Code Online (Sandbox Code Playgroud)
无论我尝试过什么,我都会收到此错误(几乎总是在三秒内,有时甚至是瞬间).
InterfaceError: (InterfaceError) 2013: Lost connection to MySQL server during query
Run Code Online (Sandbox Code Playgroud)
我已经尝试了一些方法,例如在这里的文档中添加函数pool_timeout和pool_recycle选项http://docs.sqlalchemy.org/en/latest/core/engines.htmlcreate_engine
我也试过users = pd.read_sql_query(query_folder_users, prod_engine,chunksize=10000)但得到同样的错误.
有趣的是,每当我在Sequel Pro中运行它时,这个查询都能正常工作; 它立即开始返回行,只需约10秒即可完成.输出大约是550,000行.
我找到了很多其他帖子/帖子,但似乎没有一个能解决我需要的问题:https ://groups.google.com/forum/# ! topic/sqlalchemy/TWL7aWab9ww句柄SQLAlchemy断开http:// blog .fizyk.net.pl /博客/提示,设置pool_recycle换sqlalchemys连接到mysql.html
在这里阅读文档http://dev.mysql.com/doc/refman/5.5/en/error-lost-connection.html,我注意到这一行:
有时,当在一个或多个查询中发送数百万行时,会出现"在查询期间"表单.如果您知道发生这种情况,则应尝试将net_read_timeout从默认值30秒增加到60秒或更长时间,足以完成数据传输.
好像我可能需要更改此选项,但我在SQLAlchemy文档中找不到任何提及此内容的内容.
有没有人遇到过这个问题?如果是这样,你是如何解决的?