如何将JSON转换为CSV?

lit*_*ish 161 python csv json

我有一个JSON文件,我想要转换为CSV文件.我怎么能用Python做到这一点?

我试过了:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    csv_file.writerow(item)

f.close()
Run Code Online (Sandbox Code Playgroud)

但是,它没有用.我正在使用Django,我收到的错误是:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    f.writerow(item)  # ? changed

f.close()
Run Code Online (Sandbox Code Playgroud)

那么,我尝试了以下内容:

[{
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    }, {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    }, {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }, {
        "pk": 4,
        "model": "auth.permission",
        "fields": {
            "codename": "add_group",
            "name": "Can add group",
            "content_type": 2
        }
    }, {
        "pk": 10,
        "model": "auth.permission",
        "fields": {
            "codename": "add_message",
            "name": "Can add message",
            "content_type": 4
        }
    }
]
Run Code Online (Sandbox Code Playgroud)

然后我得到错误:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    csv_file.writerow(item)

f.close()
Run Code Online (Sandbox Code Playgroud)

示例json文件:

import json
import csv

f = open('data.json')
data = json.load(f)
f.close()

f = open('data.csv')
csv_file = csv.writer(f)
for item in data:
    f.writerow(item)  # ? changed

f.close()
Run Code Online (Sandbox Code Playgroud)

YOU*_*YOU 115

我不确定这个问题是否已经解决,但是让我粘贴我所做的以供参考.

首先,您的JSON具有嵌套对象,因此通常无法直接转换为CSV.您需要将其更改为以下内容:

{
    "pk": 22,
    "model": "auth.permission",
    "codename": "add_logentry",
    "content_type": 8,
    "name": "Can add log entry"
},
......]
Run Code Online (Sandbox Code Playgroud)

这是我从中生成CSV的代码:

import csv
import json

x = """[
    {
        "pk": 22,
        "model": "auth.permission",
        "fields": {
            "codename": "add_logentry",
            "name": "Can add log entry",
            "content_type": 8
        }
    },
    {
        "pk": 23,
        "model": "auth.permission",
        "fields": {
            "codename": "change_logentry",
            "name": "Can change log entry",
            "content_type": 8
        }
    },
    {
        "pk": 24,
        "model": "auth.permission",
        "fields": {
            "codename": "delete_logentry",
            "name": "Can delete log entry",
            "content_type": 8
        }
    }
]"""

x = json.loads(x)

f = csv.writer(open("test.csv", "wb+"))

# Write CSV Header, If you dont need that, remove this line
f.writerow(["pk", "model", "codename", "name", "content_type"])

for x in x:
    f.writerow([x["pk"],
                x["model"],
                x["fields"]["codename"],
                x["fields"]["name"],
                x["fields"]["content_type"]])
Run Code Online (Sandbox Code Playgroud)

您将获得输出:

pk,model,codename,name,content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
Run Code Online (Sandbox Code Playgroud)

  • 下面我展示了一种更普遍的方法,而不必对其进行硬编码 (3认同)
  • 嘿,我已经尝试过了,但是我遇到了一个TypeError:需要一个类似字节的对象,而不是f.writerow(['pk','model','codename','name ','content_type']) (3认同)
  • 对于python3,将打开csv文件的行更改为`f = csv.writer(open(“ test.csv”,“ w”,newline ='')))` (3认同)
  • 这是工作,但很抱歉之前我能得到一些不是硬编码的东西我更好的ID我可以使用f.writerow(a)和a是我之前声明的一些变量感谢之前 (2认同)

vmg*_*vmg 86

使用pandas ,这就像使用两个命令一样简单!

pandas.read_json()
Run Code Online (Sandbox Code Playgroud)

将JSON字符串转换为pandas对象(系列或数据框).然后,假设结果存储为df:

df.to_csv()
Run Code Online (Sandbox Code Playgroud)

哪个可以返回字符串或直接写入csv文件.

基于以前答案的详细程度,我们都应该感谢大熊猫的捷径.

  • 这不适用于嵌套的JSON (9认同)
  • 这也可以作为一个最小的单行代码来完成:`curl url | python -c '导入 sys,pandas 作为 pd;pd.read_json(sys.stdin).to_csv(sys.stdout)'` (2认同)

Ale*_*ail 85

我假设您的JSON文件将解码为字典列表.首先,我们需要一个能够展平JSON对象的函数:

def flattenjson( b, delim ):
    val = {}
    for i in b.keys():
        if isinstance( b[i], dict ):
            get = flattenjson( b[i], delim )
            for j in get.keys():
                val[ i + delim + j ] = get[j]
        else:
            val[i] = b[i]

    return val
Run Code Online (Sandbox Code Playgroud)

在JSON对象上运行此代码段的结果:

flattenjson( {
    "pk": 22, 
    "model": "auth.permission", 
    "fields": {
      "codename": "add_message", 
      "name": "Can add message", 
      "content_type": 8
    }
  }, "__" )
Run Code Online (Sandbox Code Playgroud)

{
    "pk": 22, 
    "model": "auth.permission', 
    "fields__codename": "add_message", 
    "fields__name": "Can add message", 
    "fields__content_type": 8
}
Run Code Online (Sandbox Code Playgroud)

将此函数应用于JSON对象的输入数组中的每个dict后:

input = map( lambda x: flattenjson( x, "__" ), input )
Run Code Online (Sandbox Code Playgroud)

并找到相关的列名:

columns = [ x for row in input for x in row.keys() ]
columns = list( set( columns ) )
Run Code Online (Sandbox Code Playgroud)

通过csv模块运行它并不困难:

with open( fname, 'wb' ) as out_file:
    csv_w = csv.writer( out_file )
    csv_w.writerow( columns )

    for i_r in input:
        csv_w.writerow( map( lambda x: i_r.get( x, "" ), columns ) )
Run Code Online (Sandbox Code Playgroud)

我希望这有帮助!


Ale*_*lli 35

JSON可以表示各种各样的数据结构 - JS"对象"大致类似于Python dict(带字符串键),JS"数组"大致类似于Python列表,只要最后一个就可以嵌套它们"叶"元素是数字或字符串.

CSV本质上只能表示一个二维表 - 可选地带有第一行"标题",即"列名",这可以使表可解释为一个字典列表,而不是正常的解释,列表列表(同样,"叶子"元素可以是数字或字符串).

因此,在一般情况下,您无法将任意JSON结构转换为CSV.在一些特殊情况下,您可以(没有进一步嵌套的数组数组;所有具有完全相同键的对象数组).哪种特殊情况(如果有的话)适用于您的问题?解决方案的细节取决于您拥有的特殊情况.鉴于您甚至没有提到哪一个适用的惊人事实,我怀疑您可能没有考虑过约束,事实上既不适用也不适用,而您的问题无法解决.但是请澄清!


Mik*_*ass 26

一种通用解决方案,可将任何平面对象的json列表转换为csv.

将input.json文件作为命令行的第一个参数传递.

import csv, json, sys

input = open(sys.argv[1])
data = json.load(input)
input.close()

output = csv.writer(sys.stdout)

output.writerow(data[0].keys())  # header row

for row in data:
    output.writerow(row.values())
Run Code Online (Sandbox Code Playgroud)

  • 一个重要的注释 - 此代码从第一行的字段中推断出列/标题.如果你的json数据有'锯齿'列,即假设row1有5列但row2有6列,那么你需要对数据进行第一次传递以获得所有列的总集合并将其用作标题. (2认同)

Dan*_*erz 23

假设您的JSON数据位于名为的文件中,此代码应该适合您data.json.

import json
import csv

with open("data.json") as file:
    data = json.load(file)

with open("data.csv", "w") as file:
    csv_file = csv.writer(file)
    for item in data:
        fields = list(item['fields'].values())
        csv_file.writerow([item['pk'], item['model']] + fields)
Run Code Online (Sandbox Code Playgroud)


Ret*_*402 16

它易于使用csv.DictWriter(),详细的实现可以是这样的:

def read_json(filename):
    return json.loads(open(filename).read())
def write_csv(data,filename):
    with open(filename, 'w+') as outf:
        writer = csv.DictWriter(outf, data[0].keys())
        writer.writeheader()
        for row in data:
            writer.writerow(row)
# implement
write_csv(read_json('test.json'), 'output.csv')
Run Code Online (Sandbox Code Playgroud)

请注意,这假定您的所有JSON对象都具有相同的字段.

这是可以帮助您的参考.

  • @purplepsycho我发现这个答案有一个downvote,这是唯一的链接.新用户可能不知道链接只是一个不错的答案,已经纠正了这一点.我赞成; 也许你也可以鼓励新用户继续参与我们的社区? (3认同)

Tre*_*ney 16

使用json_normalizepandas

  • 给定来自 OP 的样本数据,在名为test.json.
  • encoding='utf-8' 已在此处使用,但对于其他情况可能不需要。
  • 以下代码利用了该pathlib库。
    • .open是一种方法pathlib
    • 也适用于非 Windows 路径。
  • 使用pandas.to_csv(...)将数据保存到CSV文件。
import pandas as pd
# As of Pandas 1.01, json_normalize as pandas.io.json.json_normalize is deprecated and is now exposed in the top-level namespace.
# from pandas.io.json import json_normalize
from pathlib import Path
import json

# set path to file
p = Path(r'c:\some_path_to_file\test.json')

# read json
with p.open('r', encoding='utf-8') as f:
    data = json.loads(f.read())

# create dataframe
df = pd.json_normalize(data)

# dataframe view
 pk            model  fields.codename           fields.name  fields.content_type
 22  auth.permission     add_logentry     Can add log entry                    8
 23  auth.permission  change_logentry  Can change log entry                    8
 24  auth.permission  delete_logentry  Can delete log entry                    8
  4  auth.permission        add_group         Can add group                    2
 10  auth.permission      add_message       Can add message                    4

# save to csv
df.to_csv('test.csv', index=False, encoding='utf-8')
Run Code Online (Sandbox Code Playgroud)

CSV 输出:

pk,model,fields.codename,fields.name,fields.content_type
22,auth.permission,add_logentry,Can add log entry,8
23,auth.permission,change_logentry,Can change log entry,8
24,auth.permission,delete_logentry,Can delete log entry,8
4,auth.permission,add_group,Can add group,2
10,auth.permission,add_message,Can add message,4
Run Code Online (Sandbox Code Playgroud)

更多嵌套 JSON 对象的资源:


Ama*_*nda 6

我在使用Dan提出的解决方案时遇到了麻烦,但这对我有用:

import json
import csv 

f = open('test.json')
data = json.load(f)
f.close()

f=csv.writer(open('test.csv','wb+'))

for item in data:
  f.writerow([item['pk'], item['model']] + item['fields'].values())
Run Code Online (Sandbox Code Playgroud)

"test.json"包含以下内容:

[ 
{"pk": 22, "model": "auth.permission", "fields": 
  {"codename": "add_logentry", "name": "Can add log entry", "content_type": 8 } }, 
{"pk": 23, "model": "auth.permission", "fields": 
  {"codename": "change_logentry", "name": "Can change log entry", "content_type": 8 } }, {"pk": 24, "model": "auth.permission", "fields": 
  {"codename": "delete_logentry", "name": "Can delete log entry", "content_type": 8 } }
]
Run Code Online (Sandbox Code Playgroud)


phr*_*ead 5

亚历克的回答很好,但在有多层嵌套的情况下不起作用。这是一个支持多级嵌套的修改版本。如果嵌套对象已经指定了自己的键(例如 Firebase Analytics / BigTable / BigQuery 数据),它也会使标头名称更好一点:

"""Converts JSON with nested fields into a flattened CSV file.
"""

import sys
import json
import csv
import os

import jsonlines

from orderedset import OrderedSet

# from https://stackoverflow.com/a/28246154/473201
def flattenjson( b, prefix='', delim='/', val=None ):
  if val is None:
    val = {}

  if isinstance( b, dict ):
    for j in b.keys():
      flattenjson(b[j], prefix + delim + j, delim, val)
  elif isinstance( b, list ):
    get = b
    for j in range(len(get)):
      key = str(j)

      # If the nested data contains its own key, use that as the header instead.
      if isinstance( get[j], dict ):
        if 'key' in get[j]:
          key = get[j]['key']

      flattenjson(get[j], prefix + delim + key, delim, val)
  else:
    val[prefix] = b

  return val

def main(argv):
  if len(argv) < 2:
    raise Error('Please specify a JSON file to parse')

  print "Loading and Flattening..."
  filename = argv[1]
  allRows = []
  fieldnames = OrderedSet()
  with jsonlines.open(filename) as reader:
    for obj in reader:
      # print 'orig:\n'
      # print obj
      flattened = flattenjson(obj)
      #print 'keys: %s' % flattened.keys()
      # print 'flattened:\n'
      # print flattened
      fieldnames.update(flattened.keys())
      allRows.append(flattened)

  print "Exporting to CSV..."
  outfilename = filename + '.csv'
  count = 0
  with open(outfilename, 'w') as file:
    csvwriter = csv.DictWriter(file, fieldnames=fieldnames)
    csvwriter.writeheader()
    for obj in allRows:
      # print 'allRows:\n'
      # print obj
      csvwriter.writerow(obj)
      count += 1

  print "Wrote %d rows" % count



if __name__ == '__main__':
  main(sys.argv)
Run Code Online (Sandbox Code Playgroud)


cow*_*tor 5

这是对@MikeRepass 答案的修改。此版本将 CSV 写入文件,并且适用于 Python 2 和 Python 3。

import csv,json
input_file="data.json"
output_file="data.csv"
with open(input_file) as f:
    content=json.load(f)
try:
    context=open(output_file,'w',newline='') # Python 3
except TypeError:
    context=open(output_file,'wb') # Python 2
with context as file:
    writer=csv.writer(file)
    writer.writerow(content[0].keys()) # header row
    for row in content:
        writer.writerow(row.values())
Run Code Online (Sandbox Code Playgroud)