只需直接加载数据库.批量收集网站数据,直接加载SQlite3.只需编写使用Django ORM的简单批处理应用程序.从网站收集数据并立即加载SQLite3.不要创建CSV.不要创建JSON.不要创建中间结果.不要做任何额外的工作.
编辑.
from myapp.models import MyModel
import urllib2
with open("sourceListOfURLs.txt", "r" ) as source:
for aLine in source:
for this, the, the_other in someGenerator( aLine ):
object= MyModel.objects.create( field1=this, field2=that, field3=the_other )
object.save()
def someGenerator( url ):
# open the URL with urllib2
# parse the data with BeautifulSoup
yield this, that, the_other
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
759 次 |
| 最近记录: |