使用sqlalchemy将CSV导入数据库

war*_*nry 8 python sqlite sqlalchemy

我使用示例将csv文件上载到sqlite数据库:

这是我的代码:

from numpy import genfromtxt
from time import time
from datetime import datetime
from sqlalchemy import Column, Integer, Float, Date, String, VARCHAR
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

def Load_Data(file_name):
    data = genfromtxt(file_name, delimiter=',')# skiprows=1, converters={0: lambda s: str(s)})
    return data.tolist()

Base = declarative_base()

class cdb1(Base):
    #Tell SQLAlchemy what the table name is and if there's any table-specific arguments it should know about
    __tablename__ = 'cdb1'
    __table_args__ = {'sqlite_autoincrement': True}
    #tell SQLAlchemy the name of column and its attributes:
    id = Column(Integer, primary_key=True, nullable=False) 
    name = Column(VARCHAR(40))
    shack = Column(VARCHAR)
    db = Column(Integer)
    payments = Column(Integer)
    status = Column(VARCHAR)


if __name__ == "__main__":
    t = time()
    print 'creating database'

    #Create the database
    engine = create_engine('sqlite:///cdb.db')
    Base.metadata.create_all(engine)

    #Create the session
    session = sessionmaker()
    session.configure(bind=engine)
    s = session()

    try:
        file_name = 'client_db.csv'
        data = Load_Data(file_name)

        for i in data:
            record = cdb1(**{
                'name' : i[0],
                'shack' : i[1],
                'db' : i[2],
                'payments' : i[3],
                'status' : i[4]
            })
            s.add(record) #Add all the records

        s.commit() #Attempt to commit all the records
    except:
        s.rollback() #Rollback the changes on error
        print 'error in reading'
    finally:
        s.close() #Close the connection
    print "Time elapsed: " + str(time() - t) + " s." #0.091s
Run Code Online (Sandbox Code Playgroud)

这是csv文件的前几行:

Name,Shack,DB,Payments,Status
Loyiso Dwala,I156,13542,37,LightsOnly ON
Attwell Fayo,I157,13077,32,LightsON
David Mbhele,G25,13155,33,LightsON
Run Code Online (Sandbox Code Playgroud)

数据库创建正常,但只有部分数据被捕获到属性中:'payments'和'db'列被正确填充,但其他所有数据都是NULL.

更新的正确代码(使用pandas数据帧):

from numpy import genfromtxt
from time import time
from datetime import datetime
from sqlalchemy import Column, Integer, Float, Date, String, VARCHAR
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import csv
import pandas as pd


#def Load_Data(file_name):
    #data = csv.reader(file_name, delimiter=',')# skiprows=1, converters={0: lambda s: str(s)})
    #return data.tolist()

Base = declarative_base()

class cdb1(Base):
    #Tell SQLAlchemy what the table name is and if there's any table-specific arguments it should know about
    __tablename__ = 'cdb1'
    __table_args__ = {'sqlite_autoincrement': True}
    #tell SQLAlchemy the name of column and its attributes:
    id = Column(Integer, primary_key=True, nullable=False) 
    Name = Column(VARCHAR(40))
    Shack = Column(VARCHAR)
    DB = Column(Integer)
    Payments = Column(Integer)
    Status = Column(VARCHAR)

engine = create_engine('sqlite:///cdb.db')
Base.metadata.create_all(engine)
file_name = 'client_db.csv'
df = pd.read_csv(file_name)
df.to_sql(con=engine, index_label='id', name=cdb1.__tablename__, if_exists='replace')
Run Code Online (Sandbox Code Playgroud)

Bra*_*ilo 12

你熟悉Pandas Dataframe吗?

真的很简单(和调试)

pandas.read_csv(FILE_NAME)

In [5]: pandas.read_csv('/tmp/csvt.csv')
Out[5]: 
           Name Shack     DB  Payments         Status
0  Loyiso Dwala  I156  13542        37  LightsOnly ON
1  Attwell Fayo  I157  13077        32       LightsON
2  David Mbhele   G25  13155        33       LightsON
Run Code Online (Sandbox Code Playgroud)

要将DataFrames数据插入表中,只需使用 pandas.DataFrame.to_sql即可

所以你的主要代码最终会看起来像这样:

engine = create_engine('sqlite:///cdb.db')
Base.metadata.create_all(engine)

file_name = 'client_db.csv'
df = pandas.read_csv(file_name)
df.to_sql(con=engine, index_label='id', name=cdb1.__tablename__, if_exists='replace')
Run Code Online (Sandbox Code Playgroud)

您应该在我添加的文档链接中进一步阅读,并将函数Parameters设置为适合您的目的(特别是 - if_exists,index,index_label,dtype)

  • 这个解决方案是不是仅限于小型 csv 文件/表/dbs? (2认同)
  • 否决,这是极其低效的,并且对于大表来说将花费数小时。 (2认同)
  • 对于非常大的表,这种方法会耗尽你的内存并导致崩溃。 (2认同)