如何将Pandas Dataframe写入现有的Django模型

Gre*_*own 7 python sqlite django pandas

我试图将Pandas DataFrame中的数据插入到现有的Django模型中Agency,该模型使用SQLite后端.但是,按照如何将一个Pandas Dataframe写入Django模型并将Pandas DataFrame保存到Django模型的答案导致整个SQLite表被替换并打破Django代码.具体来说,Django自动生成的id主键列被替换为index导致模板(no such column: agency.id)时出错.

这是在SQLite表上使用Pandas to_sql的代码和结果agency.

models.py:

class Agency(models.Model):
    name = models.CharField(max_length=128)
Run Code Online (Sandbox Code Playgroud)

myapp/management/commands/populate.py:

class Command(BaseCommand):

def handle(self, *args, **options):

    # Open ModelConnection
    from django.conf import settings
    database_name = settings.DATABASES['default']['NAME']
    database_url = 'sqlite:///{}'.format(database_name)
    engine = create_engine(database_url, echo=False)

    # Insert data data
    agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
    agencies.to_sql("agency", con=engine, if_exists="replace")
Run Code Online (Sandbox Code Playgroud)

呼叫' python manage.py populate'成功地将三个机构添加到表中:

index    name
0        Agency 1
1        Agency 2
2        Agency 3
Run Code Online (Sandbox Code Playgroud)

但是,这样做会改变表格的DDL:

CREATE TABLE "agency" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(128) NOT NULL)
Run Code Online (Sandbox Code Playgroud)

至:

CREATE TABLE agency (
  "index" BIGINT, 
  name TEXT
);
CREATE INDEX ix_agency_index ON agency ("index")
Run Code Online (Sandbox Code Playgroud)

如何将DataFrame添加到Django管理的模型中并保持Django ORM不变?

Gre*_*own 5

为了回答我自己的问题,当我如今经常使用Pandas将数据导入Django时,我犯的错误是试图使用Pandas内置的Sql Alchemy DB ORM来修改基础数据库表定义。在上面的上下文中,您可以简单地使用Django ORM连接并插入数据:

from myapp.models import Agency

class Command(BaseCommand):

    def handle(self, *args, **options):

        # Process data with Pandas
        agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})

        # iterate over DataFrame and create your objects
        for agency in agencies.itertuples():
            agency = Agency.objects.create(name=agency.name)
Run Code Online (Sandbox Code Playgroud)

但是,您可能经常想使用外部脚本而不是上面的管理命令或Django的shell导入数据。在这种情况下,您必须首先通过调用setup方法连接到Django ORM :

import os, sys

import django
import pandas as pd

sys.path.append('../..') # add path to project root dir
os.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"

# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):
#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"

# Connect to Django ORM
django.setup()

# process data
from myapp.models import Agency
Agency.objects.create(name='MyAgency')
Run Code Online (Sandbox Code Playgroud)
  • 在这里,我将我的设置模块导出myproject.settings到,DJANGO_SETTINGS_MODULE以便django.setup()可以选择项目设置。

  • Depending on where you run the script from, you may need to path to the system path so Django can find the settings module. In this case, I run my script two directories below my project root.

  • You can modify any settings before calling setup. If your script needs to connect to the DB differently than whats configured in settings. For example, when running a script locally against Django/postgres Docker containers.

Note, the above example was using the django-environ to specify DB settings.


jor*_*ing 5

对于那些寻求更高性能和最新解决方案的人,我建议使用manager.bulk_create并实例化 django 模型实例,但不要创建它们。

model_instances = [Agency(name=agency.name) for agency in agencies.itertuples()]
Agency.objects.bulk_create(model_instances)
Run Code Online (Sandbox Code Playgroud)

请注意,它bulk_create不会运行信号或自定义保存,因此如果您有Agency模型的自定义保存逻辑或信号挂钩,则不会触发。下面是完整的注意事项列表。

文档:https ://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create