spa*_*ine 7 python excel json dictionary pandas
我试图压扁一个看起来像这样的JSON文件:
{
"teams": [
{
"teamname": "1",
"members": [
{
"firstname": "John",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-1234",
"mobile": "",
"email": "john.doe@wildlife.net"
},
{
"firstname": "Jane",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-4321",
"mobile": "916-555-7890",
"email": "jane.doe@wildlife.net"
}
]
},
{
"teamname": "2",
"members": [
{
"firstname": "Mickey",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-0000",
"mobile": "916-555-1111",
"email": "mickey.moose@wildlife.net"
},
{
"firstname": "Minny",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-2222",
"mobile": "",
"email": "minny.moose@wildlife.net"
}
]
}
]
Run Code Online (Sandbox Code Playgroud)
}
我希望将其导出到excel表.我目前的代码是这样的:
from pandas.io.json import json_normalize
import json
import pandas as pd
inputFile = 'E:\\teams.json'
outputFile = 'E:\\teams.xlsx'
f = open(inputFile)
data = json.load(f)
f.close()
df = pd.DataFrame(data)
result1 = json_normalize(data, 'teams' )
print result1
Run Code Online (Sandbox Code Playgroud)
结果输出:
members teamname
0 [{u'firstname': u'John', u'phone': u'916-555-... 1
1 [{u'firstname': u'Mickey', u'phone': u'916-555-... 2
Run Code Online (Sandbox Code Playgroud)
每行中嵌套有2个成员的数据.我想有一个输出表,显示所有4个成员的数据及其相关的团队名称.
这是一种方法.应该给你一些想法.
df = pd.concat(
[
pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
], keys=[t['teamname'] for t in data['teams']]
)
0 1
1 email john.doe@wildlife.net jane.doe@wildlife.net
firstname John Jane
lastname Doe Doe
mobile 916-555-7890
orgname Anon Anon
phone 916-555-1234 916-555-4321
2 email mickey.moose@wildlife.net minny.moose@wildlife.net
firstname Mickey Minny
lastname Moose Moose
mobile 916-555-1111
orgname Moosers Moosers
phone 916-555-0000 916-555-2222
Run Code Online (Sandbox Code Playgroud)
要获得一个包含团队名称和成员作为行的好表,列中的所有属性:
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index()
Run Code Online (Sandbox Code Playgroud)
要将团队名称和成员作为实际列,只需重置索引即可.
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
Run Code Online (Sandbox Code Playgroud)
import json
import pandas as pd
json_text = """{
"teams": [
{
"teamname": "1",
"members": [
{
"firstname": "John",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-1234",
"mobile": "",
"email": "john.doe@wildlife.net"
},
{
"firstname": "Jane",
"lastname": "Doe",
"orgname": "Anon",
"phone": "916-555-4321",
"mobile": "916-555-7890",
"email": "jane.doe@wildlife.net"
}
]
},
{
"teamname": "2",
"members": [
{
"firstname": "Mickey",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-0000",
"mobile": "916-555-1111",
"email": "mickey.moose@wildlife.net"
},
{
"firstname": "Minny",
"lastname": "Moose",
"orgname": "Moosers",
"phone": "916-555-2222",
"mobile": "",
"email": "minny.moose@wildlife.net"
}
]
}
]
}"""
data = json.loads(json_text)
df = pd.concat(
[
pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams']
], keys=[t['teamname'] for t in data['teams']]
)
df.index.levels[0].name = 'teamname'
df.columns.name = 'member'
df.T.stack(0).swaplevel(0, 1).sort_index().reset_index()
Run Code Online (Sandbox Code Playgroud)
小智 7
pandas.io.json.json_normalizejson_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])
output:
email firstname lastname mobile orgname phone teams.teamname
0 john.doe@wildlife.net John Doe Anon 916-555-1234 1
1 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-4321 1
2 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-0000 2
3 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222 2
Run Code Online (Sandbox Code Playgroud)
说明
from pandas.io.json import json_normalize
import pandas as pd
Run Code Online (Sandbox Code Playgroud)
我最近只学习了如何使用json_normalize函数,因此我的解释可能不正确。
从我所谓的“第0层”开始
json_normalize(data)
output:
teams
0 [{'teamname': '1', 'members': [{'firstname': '...
Run Code Online (Sandbox Code Playgroud)
有1列和1行。一切都在“团队”栏中。
使用record_path =查看我所说的“第1层”
json_normalize(data,record_path='teams')
output:
members teamname
0 [{'firstname': 'John', 'lastname': 'Doe', 'org... 1
1 [{'firstname': 'Mickey', 'lastname': 'Moose', ... 2
Run Code Online (Sandbox Code Playgroud)
在第1层中,我们已经平化了“团队名称”,但是内部“成员”更多了。
使用record_path =查看第2层。首先,该符号不直观。我现在通过['layer','deeperlayer']记住了,结果是layer.deeperlayer。
json_normalize(data,record_path=['teams','members'])
output:
email firstname lastname mobile orgname phone
0 john.doe@wildlife.net John Doe Anon 916-555-1234
1 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-4321
2 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-0000
3 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222
Run Code Online (Sandbox Code Playgroud)
打扰一下,我不知道如何在响应中制作表格。
最后,我们使用meta =添加第1层列
json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']])
output:
email firstname lastname mobile orgname phone teams.teamname
0 john.doe@wildlife.net John Doe Anon 916-555-1234 1
1 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-4321 1
2 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-0000 2
3 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222 2
Run Code Online (Sandbox Code Playgroud)
请注意,我们如何需要一个列表列表供meta = [[]]引用第1层。如果需要从第0层和第1层中获取一列,则可以执行以下操作:
json_normalize(data,record_path=['layer1','layer2'],meta=['layer0',['layer0','layer1']])
Run Code Online (Sandbox Code Playgroud)
json_normalize的结果是一个熊猫数据框。