我使用python 3.6并尝试使用下面的代码下载json文件(350 MB)作为pandas数据帧.但是,我收到以下错误:
Run Code Online (Sandbox Code Playgroud)data_json_str = "[" + ",".join(data) + "] "TypeError: sequence item 0: expected str instance, bytes found
我该如何修复错误?
import pandas as pd
# read the entire file into a python array
with open('C:/Users/Alberto/nutrients.json', 'rb') as f:
data = f.readlines()
# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)
# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in …Run Code Online (Sandbox Code Playgroud) 我已经看到很多使用 pandas 在 stackoverflow 中读取 json 的问题,但我仍然无法解决这个简单的问题。
{"session_id":{"0":["X061RFWB06K9V"],"1":["5AZ2X2A9BHH5U"]},"unix_timestamp":{"0":[1442503708],"1":[1441353991]},"cities":{"0":["New York NY, Newark NJ"],"1":["New York NY, Jersey City NJ, Philadelphia PA"]},"user":{"0":[[{"user_id":2024,"joining_date":"2015-03-22","country":"UK"}]],"1":[[{"user_id":2853,"joining_date":"2015-03-28","country":"DE"}]]}}
Run Code Online (Sandbox Code Playgroud)
import numpy as np
import pandas as pd
import json
from pandas.io.json import json_normalize
# attempt1
df = pd.read_json('a.json')
# attempt2
with open('a.json') as fi:
data = json.load(fi)
df = json_normalize(data,record_path='user',meta=['session_id','unix_timestamp','cities'])
Both of them do not give me the required output.
Run Code Online (Sandbox Code Playgroud)
session_id unix_timestamp cities user_id joining_date country
0 X061RFWB06K9V 1442503708 New York NY 2024 2015-03-22 UK
0 X061RFWB06K9V …Run Code Online (Sandbox Code Playgroud)