Gre*_*own 7 python sqlite django pandas
我试图将Pandas DataFrame中的数据插入到现有的Django模型中Agency,该模型使用SQLite后端.但是,按照如何将一个Pandas Dataframe写入Django模型并将Pandas DataFrame保存到Django模型的答案导致整个SQLite表被替换并打破Django代码.具体来说,Django自动生成的id主键列被替换为index导致模板(no such column: agency.id)时出错.
这是在SQLite表上使用Pandas to_sql的代码和结果agency.
在models.py:
class Agency(models.Model):
name = models.CharField(max_length=128)
Run Code Online (Sandbox Code Playgroud)
在myapp/management/commands/populate.py:
class Command(BaseCommand):
def handle(self, *args, **options):
# Open ModelConnection
from django.conf import settings
database_name = settings.DATABASES['default']['NAME']
database_url = 'sqlite:///{}'.format(database_name)
engine = create_engine(database_url, echo=False)
# Insert data data
agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
agencies.to_sql("agency", con=engine, if_exists="replace")
Run Code Online (Sandbox Code Playgroud)
呼叫' python manage.py populate'成功地将三个机构添加到表中:
index name
0 Agency 1
1 Agency 2
2 Agency 3
Run Code Online (Sandbox Code Playgroud)
但是,这样做会改变表格的DDL:
CREATE TABLE "agency" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(128) NOT NULL)
Run Code Online (Sandbox Code Playgroud)
至:
CREATE TABLE agency (
"index" BIGINT,
name TEXT
);
CREATE INDEX ix_agency_index ON agency ("index")
Run Code Online (Sandbox Code Playgroud)
如何将DataFrame添加到Django管理的模型中并保持Django ORM不变?
为了回答我自己的问题,当我如今经常使用Pandas将数据导入Django时,我犯的错误是试图使用Pandas内置的Sql Alchemy DB ORM来修改基础数据库表定义。在上面的上下文中,您可以简单地使用Django ORM连接并插入数据:
from myapp.models import Agency
class Command(BaseCommand):
def handle(self, *args, **options):
# Process data with Pandas
agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
# iterate over DataFrame and create your objects
for agency in agencies.itertuples():
agency = Agency.objects.create(name=agency.name)
Run Code Online (Sandbox Code Playgroud)
但是,您可能经常想使用外部脚本而不是上面的管理命令或Django的shell导入数据。在这种情况下,您必须首先通过调用setup方法连接到Django ORM :
import os, sys
import django
import pandas as pd
sys.path.append('../..') # add path to project root dir
os.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"
# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):
#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"
# Connect to Django ORM
django.setup()
# process data
from myapp.models import Agency
Agency.objects.create(name='MyAgency')
Run Code Online (Sandbox Code Playgroud)
在这里,我将我的设置模块导出myproject.settings到,DJANGO_SETTINGS_MODULE以便django.setup()可以选择项目设置。
Depending on where you run the script from, you may need to path to the system path so Django can find the settings module. In this case, I run my script two directories below my project root.
You can modify any settings before calling setup. If your script needs to connect to the DB differently than whats configured in settings. For example, when running a script locally against Django/postgres Docker containers.
Note, the above example was using the django-environ to specify DB settings.
对于那些寻求更高性能和最新解决方案的人,我建议使用manager.bulk_create并实例化 django 模型实例,但不要创建它们。
model_instances = [Agency(name=agency.name) for agency in agencies.itertuples()]
Agency.objects.bulk_create(model_instances)
Run Code Online (Sandbox Code Playgroud)
请注意,它bulk_create不会运行信号或自定义保存,因此如果您有Agency模型的自定义保存逻辑或信号挂钩,则不会触发。下面是完整的注意事项列表。
文档:https ://docs.djangoproject.com/en/3.0/ref/models/querysets/#bulk-create
| 归档时间: |
|
| 查看次数: |
4078 次 |
| 最近记录: |