我有一个多GB的JSON文件.该文件由JSON对象组成,每个对象不超过几千个字符,但记录之间没有换行符.
使用Python 3和json模块,我如何一次从文件读入一个JSON对象到内存?
数据位于纯文本文件中.这是一个类似记录的例子.实际记录包含许多嵌套字典和列表.
以可读格式记录:
{
"results": {
"__metadata": {
"type": "DataServiceProviderDemo.Address"
},
"Street": "NE 228th",
"City": "Sammamish",
"State": "WA",
"ZipCode": "98074",
"Country": "USA"
}
}
}
Run Code Online (Sandbox Code Playgroud)
实际格式.新记录一个接一个地开始,没有任何中断.
{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }
Run Code Online (Sandbox Code Playgroud) 我正在以 json 格式从 twitter 获取数据并将其存储在文件中。
consumer_key = 'Consumer KEY'
consumer_secret = 'Secret'
access_token = 'Token'
access_secret = 'Access Secret'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
os.chdir('Path')
file = open('TwData.json','wb')
for status in tweepy.Cursor(api.home_timeline).items(15):
simplejson.dump(status._json,file,sort_keys = True)
file.close
Run Code Online (Sandbox Code Playgroud)
但我收到以下错误:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Users/abc/anaconda/lib/python3.6/json/__init__.py", line 180, in dump
fp.write(chunk)
TypeError: a bytes-like object is required, not 'str'
Run Code Online (Sandbox Code Playgroud) 我试图json从twitter上载这个字典:
{"created_at":"Thu Jul 10 20:02:00 +0000 2014","id":487325888950710272,"id_str":"487325888950710272","text":"\u5f81\u9678\u300c\u5de6\u8155\u306e\u7fa9\u624b\u306f\u30db\u30ed\u3060\u300d","source":"\u003ca href=\"http:\/\/twittbot.net\/\" rel=\"nofollow\"\u003etwittbot.net\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":1429838018,"id_str":"1429838018","name":"\u3053\u3093\u306a\uff30\uff30\u306f\u5acc\u3060\u3002","screen_name":"iyada_pp","location":"\u516c\u5b89\u5c40\u306e\u3069\u3053\u304b\u3002","url":null,"description":"\u3010\u3053\u3093\u306aPSYCHO-PASS\u306f\u5acc\u3060\u306a\u3011\u3068\u3044\u3046\u304f\u3060\u3089\u306a\u3044\u5984\u60f3bot\u3067\u3059\u3002\u30ad\u30e3\u30e9\u5d29\u58ca\u304c\u6fc0\u3057\u3044\u306e\u3067\u3054\u6ce8\u610f\u304f\u3060\u3055\u3044\u3002","protected":false,"followers_count":99,"friends_count":98,"listed_count":5,"created_at":"Wed May 15 07:52:33 +0000 2013","favourites_count":0,"utc_offset":null,"time_zone":null,"geo_enabled":false,"verified":false,"statuses_count":12584,"lang":"ja","contributors_enabled":false,"is_translator":false,"is_translation_enabled":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/3661872276\/ab7201283dac5dc1789bb6dfa9b6abe4_normal.jpeg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/3661872276\/ab7201283dac5dc1789bb6dfa9b6abe4_normal.jpeg","profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"urls":[],"user_mentions":[]},"favorited":false,"retweeted":false,"filter_level":"medium","lang":"ja"}
Run Code Online (Sandbox Code Playgroud)
我做json.load()了那个dict.但我在下面得到了错误消息:
NameError: name 'false' is not defined
Run Code Online (Sandbox Code Playgroud)
发生了什么?
我如何在 python 中编写一个函数,该函数将接受一个带有多个字典的字符串,每行一个,并将其转换,以便 json.loads 可以在单次执行中解析整个字符串。
例如,如果输入是(每行一个字典):
Input = """{"a":[1,2,3], "b":[4,5]}
{"z":[-1,-2], "x":-3}"""
Run Code Online (Sandbox Code Playgroud)
这不会用 json.loads(Input) 进行解析。我需要编写一个函数来修改它,以便它能够正确解析。我在想如果该函数可以将其更改为这样的内容,json将能够解析它,但我不确定如何实现它:
Input2 = """{ "Dict1" : {"a":[1,2,3], "b":[4,5]},
"Dict2" : {"z":[-1,-2], "x":-3} }"""
Run Code Online (Sandbox Code Playgroud) 我正在研究一个可能最终试图将非常大的json数组序列化到文件的过程。因此,将整个阵列加载到内存中,然后仅转储到文件将不起作用。我需要将单个项目流式传输到文件,以避免内存不足的问题。
令人惊讶的是,我找不到执行此操作的任何示例。下面的代码段是我拼凑而成的。有一个更好的方法吗?
first_item = True
with open('big_json_array.json', 'w') as out:
out.write('[')
for item in some_very_big_iterator:
if first_item:
out.write(json.dumps(item))
first_item = False
else:
out.write("," + json.dumps(item))
out.write("]")
Run Code Online (Sandbox Code Playgroud) 我想在一个 json 文件中存储几个变量。
我知道我可以像这样转储多个变量-
import json
with open('data.json', 'w') as fp:
json.dump(p_id,fp, sort_keys = True, indent = 4)
json.dump(word_list, fp, sort_keys = True, indent = 4)
.
.
.
Run Code Online (Sandbox Code Playgroud)
但是这些变量的存储没有它们的名字,试图加载它们会出错。如何适当地存储和提取我想要的变量?
我已经收到由python中的json转储生成的文本文件,如下所示:
[0.1,0.1,0.2,0.3]
[0.1,0.3,0.4,0.3]
[0.1,0.1,0.3,0.3]
[0.3,0.1,0.5,0.3]
.
.
.
[0.1,0.1,0.3,0.3]
[0.3,0.4,0.6,0.3]
Run Code Online (Sandbox Code Playgroud)
等等〜相当多的线〜> 10,000,000
我想找出最快/最有效的方法来读取文件,并将它们实际转换为列表。
我有一个程序,该程序具有for循环,该循环运行带有列表的特定操作:
for x in range(filelength):
for y in list(each line from the file):
use the numbers from each list to perform certain operations
Run Code Online (Sandbox Code Playgroud)
我当时正在考虑从文本文件中解析出所有括号,并将每个值逗号分隔为每一行的空白列表(这可能很慢且很耗时),但是我认为可能存在python的功能来转换以字符串形式表示的list可以轻松快速地放入python中的实际列表中。
任何想法或建议,将不胜感激。