bba*_*kji 2 python json scrapy
I output the scraped data in json format. Default scrapy exporter outputs list of dict in json format. Item type looks like:
[{"Product Name":"Product1", "Categories":["Clothing","Top"], "Price":"20.5", "Currency":"USD"},
{"Product Name":"Product2", "Categories":["Clothing","Top"], "Price":"21.5", "Currency":"USD"},
{"Product Name":"Product3", "Categories":["Clothing","Top"], "Price":"22.5", "Currency":"USD"},
{"Product Name":"Product4", "Categories":["Clothing","Top"], "Price":"23.5", "Currency":"USD"}, ...]
Run Code Online (Sandbox Code Playgroud)
But I want to export the data in a specific format like this:
{
"Shop Name":"Shop 1",
"Location":"XXXXXXXXX",
"Contact":"XXXX-XXXXX",
"Products":
[{"Product Name":"Product1", "Categories":["Clothing","Top"], "Price":"20.5", "Currency":"USD"},
{"Product Name":"Product2", "Categories":["Clothing","Top"], "Price":"21.5", "Currency":"USD"},
{"Product Name":"Product3", "Categories":["Clothing","Top"], "Price":"22.5", "Currency":"USD"},
{"Product Name":"Product4", "Categories":["Clothing","Top"], "Price":"23.5", "Currency":"USD"}, ...]
}
Run Code Online (Sandbox Code Playgroud)
Please advice me any solution. Thank you.
这是在scrapy网页有据可查这里。
from scrapy.exporters import JsonItemExporter
class ItemPipeline(object):
file = None
def open_spider(self, spider):
self.file = open('item.json', 'w')
self.exporter = JsonItemExporter(self.file)
self.exporter.start_exporting()
def close_spider(self, spider):
self.exporter.finish_exporting()
self.file.close()
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
Run Code Online (Sandbox Code Playgroud)
这将创建一个包含项目的json文件。
我试图导出漂亮的打印 JSON,这对我有用。
我创建了一个如下所示的管道:
class JsonPipeline(object):
def open_spider(self, spider):
self.file = open('your_file_name.json', 'wb')
self.file.write("[")
def close_spider(self, spider):
self.file.write("]")
self.file.close()
def process_item(self, item, spider):
line = json.dumps(
dict(item),
sort_keys=True,
indent=4,
separators=(',', ': ')
) + ",\n"
self.file.write(line)
return item
Run Code Online (Sandbox Code Playgroud)
它与 scrapy 文档https://doc.scrapy.org/en/latest/topics/item-pipeline.html中的示例类似,只不过它将每个 JSON 属性缩进打印在新行上。
请参阅此处有关漂亮打印的部分https://docs.python.org/2/library/json.html