在将数据查询到pandas数据帧时,有没有办法保留SqlAlchemy属性名称?
这是我的数据库的简单映射.对于学校表,我将"学校名称"的"SchoolDistrict"重命名为更短的"地区".我从DBA中删除了几个层,因此在源代码中更改它们是不可行的.
class School(Base):
__tablename__ = 'DimSchool'
id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
class StudentScore(Base):
__tablename__ = 'FactStudentScore'
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)
school = relationship("School", backref='studentscore')
Run Code Online (Sandbox Code Playgroud)
所以当我查询类似的东西:
query = session.query(StudentScore, School).join(School)
df = pd.read_sql(query.statement, query.session.bind)
Run Code Online (Sandbox Code Playgroud)
我在返回的DataFrame df中得到了列的基础'SchoolDistrict'名称,而不是我的属性名称.
编辑:更令人讨厌的情况是表格中存在重复的列名称.例如:
class Teacher(Base):
__tablename__ = 'DimTeacher'
id = Column('TeacherKey', Integer, primary_key=True)
fname = Column('FirstName', String)
lname = Column('FirstName', String)
class Student(Base):
__tablename__ = 'DimStudent' …
Run Code Online (Sandbox Code Playgroud) 我知道read_csv
有,mangle_dup_cols
但如何在发出后从 sqlalchemy 中的 sql join 执行相同的操作:
pd.DataFrame(result.fetchall(), columns=result.keys())
Run Code Online (Sandbox Code Playgroud)
df.info()
由于重复的列名,这在使用时给了我一个错误。