我想从Python中的文件/流中读取多个JSON对象,一次一个.不幸的是json.load(),.read()直到文件结束; 似乎没有任何方法可以使用它来读取单个对象或懒惰地迭代对象.
有没有办法做到这一点?使用标准库是理想的,但如果有第三方库,我会使用它.
目前我将每个对象放在一个单独的行上并使用json.loads(f.readline()),但我真的不想这样做.
import my_json as json
import sys
for o in json.iterload(sys.stdin):
print("Working on a", type(o))
Run Code Online (Sandbox Code Playgroud)
{"foo": ["bar", "baz"]} 1 2 [] 4 5 6
Run Code Online (Sandbox Code Playgroud)
$ python3.2 example.py < in.txt
Working on a dict
Working on a int
Working on a int
Working on a list
Working on a int
Working on a int
Working on a int
Run Code Online (Sandbox Code Playgroud)
Tho*_*s K 38
JSON通常不适合这种增量使用; 没有标准的方法来序列化多个对象,以便可以一次一个地轻松加载它们,而无需解析整个对象.
您正在使用的每行解决方案也可以在其他地方看到.Scrapy将其称为"JSON线":
你可以用Python来做更多的事情:
for jsonline in f:
yield json.loads(jsonline) # or do the processing in this loop
Run Code Online (Sandbox Code Playgroud)
我认为这是最好的方式 - 它不依赖于任何第三方库,而且很容易理解发生了什么.我也在自己的一些代码中使用过它.
Kru*_*lur 29
也许有点晚了,但我有这个确切的问题(好吧,或多或少).我对这些问题的标准解决方案通常只是对一些众所周知的根对象进行正则表达式分割,但在我的情况下,这是不可能的.一般来说,唯一可行的方法是实现一个合适的标记化器.
在没有找到足够通用且性能相当好的解决方案后,我自己完成了这个,编写splitstream模块.它是一个预标记器,可以理解JSON和XML,并将连续的流分成多个块进行解析(尽管它实际解析了).为了获得某种性能,它被编写为C模块.
例:
from splitstream import splitfile
for jsonstr in splitfile(sys.stdin, format="json")):
yield json.loads(jsonstr)
Run Code Online (Sandbox Code Playgroud)
Jer*_*man 26
当然你可以做到这一点.你只需要raw_decode直接接受.此实现将整个文件加载到内存中并对该字符串进行操作(同样json.load如此); 如果您有大文件,您可以将其修改为只在必要时从文件中读取而没有太大困难.
import json
from json.decoder import WHITESPACE
def iterload(string_or_fp, cls=json.JSONDecoder, **kwargs):
if isinstance(string_or_fp, file):
string = string_or_fp.read()
else:
string = str(string_or_fp)
decoder = cls(**kwargs)
idx = WHITESPACE.match(string, 0).end()
while idx < len(string):
obj, end = decoder.raw_decode(string, idx)
yield obj
idx = WHITESPACE.match(string, end).end()
Run Code Online (Sandbox Code Playgroud)
用法:就像你要求的那样,它是一个发电机.
Ben*_*ict 25
这实际上是一个非常令人讨厌的问题,因为你必须在行中进行流式处理,但是跨多个行匹配大括号的模式匹配,而且模式匹配json.这是一种json-preparse,然后是json解析.与其他格式相比,Json易于解析,因此并不总是需要使用解析库,但是,我们应该如何解决这些冲突问题?
发电机来救援!
对于像这样的问题,生成器的美妙之处在于,您可以将它们叠加在一起,逐渐消除问题的难度,同时保持懒惰.我还考虑使用机制将值传递给生成器(send())但幸运的是发现我不需要使用它.
要解决第一个问题,你需要某种streamingfinditer,作为re.finditer的流媒体版本.我在下面的尝试根据需要拉入行(取消注释调试语句以查看),同时仍然返回匹配.我实际上然后稍微修改它以产生不匹配的行以及匹配(在产生的元组的第一部分中标记为0或1).
import re
def streamingfinditer(pat,stream):
for s in stream:
# print "Read next line: " + s
while 1:
m = re.search(pat,s)
if not m:
yield (0,s)
break
yield (1,m.group())
s = re.split(pat,s,1)[1]
Run Code Online (Sandbox Code Playgroud)
这样,就可以匹配直到大括号,每次都考虑大括号是否平衡,然后根据需要返回简单或复合对象.
braces='{}[]'
whitespaceesc=' \t'
bracesesc='\\'+'\\'.join(braces)
balancemap=dict(zip(braces,[1,-1,1,-1]))
bracespat='['+bracesesc+']'
nobracespat='[^'+bracesesc+']*'
untilbracespat=nobracespat+bracespat
def simpleorcompoundobjects(stream):
obj = ""
unbalanced = 0
for (c,m) in streamingfinditer(re.compile(untilbracespat),stream):
if (c == 0): # remainder of line returned, nothing interesting
if (unbalanced == 0):
yield (0,m)
else:
obj += m
if (c == 1): # match returned
if (unbalanced == 0):
yield (0,m[:-1])
obj += m[-1]
else:
obj += m
unbalanced += balancemap[m[-1]]
if (unbalanced == 0):
yield (1,obj)
obj=""
Run Code Online (Sandbox Code Playgroud)
这将返回元组,如下所示:
(0,"String of simple non-braced objects easy to parse")
(1,"{ 'Compound' : 'objects' }")
Run Code Online (Sandbox Code Playgroud)
基本上这是令人讨厌的部分.我们现在只需要按照我们认为合适的方式进行最终的解析.例如,我们可以使用Jeremy Roman的iterload函数(谢谢!)来解析单行:
def streamingiterload(stream):
for c,o in simpleorcompoundobjects(stream):
for x in iterload(o):
yield x
Run Code Online (Sandbox Code Playgroud)
测试一下:
of = open("test.json","w")
of.write("""[ "hello" ] { "goodbye" : 1 } 1 2 {
} 2
9 78
4 5 { "animals" : [ "dog" , "lots of mice" ,
"cat" ] }
""")
of.close()
// open & stream the json
f = open("test.json","r")
for o in streamingiterload(f.readlines()):
print o
f.close()
Run Code Online (Sandbox Code Playgroud)
我得到了这些结果(如果你打开那个调试行,你会看到它根据需要拉入行):
[u'hello']
{u'goodbye': 1}
1
2
{}
2
9
78
4
5
{u'animals': [u'dog', u'lots of mice', u'cat']}
Run Code Online (Sandbox Code Playgroud)
这不适用于所有情况.由于json库的实现,如果没有自己重新实现解析器,就不可能完全正确地工作.
Nic*_*son 18
这是一个更简单,更简单的解决方案.秘诀是尝试,失败并使用异常中的信息进行正确解析.唯一的限制是文件必须是可搜索的.
def stream_read_json(fn):
import json
start_pos = 0
with open(fn, 'r') as f:
while True:
try:
obj = json.load(f)
yield obj
return
except json.JSONDecodeError as e:
f.seek(start_pos)
json_str = f.read(e.pos)
obj = json.loads(json_str)
start_pos += e.pos
yield obj
Run Code Online (Sandbox Code Playgroud)
编辑:只是注意到这只适用于Python> = 3.5.对于早期,失败返回ValueError,您必须从字符串中解析出位置,例如
def stream_read_json(fn):
import json
import re
start_pos = 0
with open(fn, 'r') as f:
while True:
try:
obj = json.load(f)
yield obj
return
except ValueError as e:
f.seek(start_pos)
end_pos = int(re.match('Extra data: line \d+ column \d+ .*\(char (\d+).*\)',
e.args[0]).groups()[0])
json_str = f.read(end_pos)
obj = json.loads(json_str)
start_pos += end_pos
yield obj
Run Code Online (Sandbox Code Playgroud)
我相信这样做的更好方法是使用状态机。以下是我通过将以下链接上的NodeJS代码转换为Python 3得出的示例代码(使用的非本地关键字仅在Python 3中可用,该代码在Python 2上不起作用)
编辑1:更新并使其代码与Python 2兼容
编辑2:更新并添加了仅Python3版本
https://gist.github.com/creationix/5992451
# A streaming byte oriented JSON parser. Feed it a single byte at a time and
# it will emit complete objects as it comes across them. Whitespace within and
# between objects is ignored. This means it can parse newline delimited JSON.
import math
def json_machine(emit, next_func=None):
def _value(byte_data):
if not byte_data:
return
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _value # Ignore whitespace
if byte_data == 0x22: # "
return string_machine(on_value)
if byte_data == 0x2d or (0x30 <= byte_data < 0x40): # - or 0-9
return number_machine(byte_data, on_number)
if byte_data == 0x7b: #:
return object_machine(on_value)
if byte_data == 0x5b: # [
return array_machine(on_value)
if byte_data == 0x74: # t
return constant_machine(TRUE, True, on_value)
if byte_data == 0x66: # f
return constant_machine(FALSE, False, on_value)
if byte_data == 0x6e: # n
return constant_machine(NULL, None, on_value)
if next_func == _value:
raise Exception("Unexpected 0x" + str(byte_data))
return next_func(byte_data)
def on_value(value):
emit(value)
return next_func
def on_number(number, byte):
emit(number)
return _value(byte)
next_func = next_func or _value
return _value
TRUE = [0x72, 0x75, 0x65]
FALSE = [0x61, 0x6c, 0x73, 0x65]
NULL = [0x75, 0x6c, 0x6c]
def constant_machine(bytes_data, value, emit):
i = 0
length = len(bytes_data)
def _constant(byte_data):
nonlocal i
if byte_data != bytes_data[i]:
i += 1
raise Exception("Unexpected 0x" + str(byte_data))
i += 1
if i < length:
return _constant
return emit(value)
return _constant
def string_machine(emit):
string = ""
def _string(byte_data):
nonlocal string
if byte_data == 0x22: # "
return emit(string)
if byte_data == 0x5c: # \
return _escaped_string
if byte_data & 0x80: # UTF-8 handling
return utf8_machine(byte_data, on_char_code)
if byte_data < 0x20: # ASCII control character
raise Exception("Unexpected control character: 0x" + str(byte_data))
string += chr(byte_data)
return _string
def _escaped_string(byte_data):
nonlocal string
if byte_data == 0x22 or byte_data == 0x5c or byte_data == 0x2f: # " \ /
string += chr(byte_data)
return _string
if byte_data == 0x62: # b
string += "\b"
return _string
if byte_data == 0x66: # f
string += "\f"
return _string
if byte_data == 0x6e: # n
string += "\n"
return _string
if byte_data == 0x72: # r
string += "\r"
return _string
if byte_data == 0x74: # t
string += "\t"
return _string
if byte_data == 0x75: # u
return hex_machine(on_char_code)
def on_char_code(char_code):
nonlocal string
string += chr(char_code)
return _string
return _string
# Nestable state machine for UTF-8 Decoding.
def utf8_machine(byte_data, emit):
left = 0
num = 0
def _utf8(byte_data):
nonlocal num, left
if (byte_data & 0xc0) != 0x80:
raise Exception("Invalid byte in UTF-8 character: 0x" + byte_data.toString(16))
left = left - 1
num |= (byte_data & 0x3f) << (left * 6)
if left:
return _utf8
return emit(num)
if 0xc0 <= byte_data < 0xe0: # 2-byte UTF-8 Character
left = 1
num = (byte_data & 0x1f) << 6
return _utf8
if 0xe0 <= byte_data < 0xf0: # 3-byte UTF-8 Character
left = 2
num = (byte_data & 0xf) << 12
return _utf8
if 0xf0 <= byte_data < 0xf8: # 4-byte UTF-8 Character
left = 3
num = (byte_data & 0x07) << 18
return _utf8
raise Exception("Invalid byte in UTF-8 string: 0x" + str(byte_data))
# Nestable state machine for hex escaped characters
def hex_machine(emit):
left = 4
num = 0
def _hex(byte_data):
nonlocal num, left
if 0x30 <= byte_data < 0x40:
i = byte_data - 0x30
elif 0x61 <= byte_data <= 0x66:
i = byte_data - 0x57
elif 0x41 <= byte_data <= 0x46:
i = byte_data - 0x37
else:
raise Exception("Expected hex char in string hex escape")
left -= 1
num |= i << (left * 4)
if left:
return _hex
return emit(num)
return _hex
def number_machine(byte_data, emit):
sign = 1
number = 0
decimal = 0
esign = 1
exponent = 0
def _mid(byte_data):
if byte_data == 0x2e: # .
return _decimal
return _later(byte_data)
def _number(byte_data):
nonlocal number
if 0x30 <= byte_data < 0x40:
number = number * 10 + (byte_data - 0x30)
return _number
return _mid(byte_data)
def _start(byte_data):
if byte_data == 0x30:
return _mid
if 0x30 < byte_data < 0x40:
return _number(byte_data)
raise Exception("Invalid number: 0x" + str(byte_data))
if byte_data == 0x2d: # -
sign = -1
return _start
def _decimal(byte_data):
nonlocal decimal
if 0x30 <= byte_data < 0x40:
decimal = (decimal + byte_data - 0x30) / 10
return _decimal
return _later(byte_data)
def _later(byte_data):
if byte_data == 0x45 or byte_data == 0x65: # E e
return _esign
return _done(byte_data)
def _esign(byte_data):
nonlocal esign
if byte_data == 0x2b: # +
return _exponent
if byte_data == 0x2d: # -
esign = -1
return _exponent
return _exponent(byte_data)
def _exponent(byte_data):
nonlocal exponent
if 0x30 <= byte_data < 0x40:
exponent = exponent * 10 + (byte_data - 0x30)
return _exponent
return _done(byte_data)
def _done(byte_data):
value = sign * (number + decimal)
if exponent:
value *= math.pow(10, esign * exponent)
return emit(value, byte_data)
return _start(byte_data)
def array_machine(emit):
array_data = []
def _array(byte_data):
if byte_data == 0x5d: # ]
return emit(array_data)
return json_machine(on_value, _comma)(byte_data)
def on_value(value):
array_data.append(value)
def _comma(byte_data):
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _comma # Ignore whitespace
if byte_data == 0x2c: # ,
return json_machine(on_value, _comma)
if byte_data == 0x5d: # ]
return emit(array_data)
raise Exception("Unexpected byte: 0x" + str(byte_data) + " in array body")
return _array
def object_machine(emit):
object_data = {}
key = None
def _object(byte_data):
if byte_data == 0x7d: #
return emit(object_data)
return _key(byte_data)
def _key(byte_data):
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _object # Ignore whitespace
if byte_data == 0x22:
return string_machine(on_key)
raise Exception("Unexpected byte: 0x" + str(byte_data))
def on_key(result):
nonlocal key
key = result
return _colon
def _colon(byte_data):
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _colon # Ignore whitespace
if byte_data == 0x3a: # :
return json_machine(on_value, _comma)
raise Exception("Unexpected byte: 0x" + str(byte_data))
def on_value(value):
object_data[key] = value
def _comma(byte_data):
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _comma # Ignore whitespace
if byte_data == 0x2c: # ,
return _key
if byte_data == 0x7d: #
return emit(object_data)
raise Exception("Unexpected byte: 0x" + str(byte_data))
return _object
Run Code Online (Sandbox Code Playgroud)
# A streaming byte oriented JSON parser. Feed it a single byte at a time and
# it will emit complete objects as it comes across them. Whitespace within and
# between objects is ignored. This means it can parse newline delimited JSON.
import math
def json_machine(emit, next_func=None):
def _value(byte_data):
if not byte_data:
return
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _value # Ignore whitespace
if byte_data == 0x22: # "
return string_machine(on_value)
if byte_data == 0x2d or (0x30 <= byte_data < 0x40): # - or 0-9
return number_machine(byte_data, on_number)
if byte_data == 0x7b: #:
return object_machine(on_value)
if byte_data == 0x5b: # [
return array_machine(on_value)
if byte_data == 0x74: # t
return constant_machine(TRUE, True, on_value)
if byte_data == 0x66: # f
return constant_machine(FALSE, False, on_value)
if byte_data == 0x6e: # n
return constant_machine(NULL, None, on_value)
if next_func == _value:
raise Exception("Unexpected 0x" + str(byte_data))
return next_func(byte_data)
def on_value(value):
emit(value)
return next_func
def on_number(number, byte):
emit(number)
return _value(byte)
next_func = next_func or _value
return _value
TRUE = [0x72, 0x75, 0x65]
FALSE = [0x61, 0x6c, 0x73, 0x65]
NULL = [0x75, 0x6c, 0x6c]
def constant_machine(bytes_data, value, emit):
local_data = {"i": 0, "length": len(bytes_data)}
def _constant(byte_data):
# nonlocal i, length
if byte_data != bytes_data[local_data["i"]]:
local_data["i"] += 1
raise Exception("Unexpected 0x" + byte_data.toString(16))
local_data["i"] += 1
if local_data["i"] < local_data["length"]:
return _constant
return emit(value)
return _constant
def string_machine(emit):
local_data = {"string": ""}
def _string(byte_data):
# nonlocal string
if byte_data == 0x22: # "
return emit(local_data["string"])
if byte_data == 0x5c: # \
return _escaped_string
if byte_data & 0x80: # UTF-8 handling
return utf8_machine(byte_data, on_char_code)
if byte_data < 0x20: # ASCII control character
raise Exception("Unexpected control character: 0x" + byte_data.toString(16))
local_data["string"] += chr(byte_data)
return _string
def _escaped_string(byte_data):
# nonlocal string
if byte_data == 0x22 or byte_data == 0x5c or byte_data == 0x2f: # " \ /
local_data["string"] += chr(byte_data)
return _string
if byte_data == 0x62: # b
local_data["string"] += "\b"
return _string
if byte_data == 0x66: # f
local_data["string"] += "\f"
return _string
if byte_data == 0x6e: # n
local_data["string"] += "\n"
return _string
if byte_data == 0x72: # r
local_data["string"] += "\r"
return _string
if byte_data == 0x74: # t
local_data["string"] += "\t"
return _string
if byte_data == 0x75: # u
return hex_machine(on_char_code)
def on_char_code(char_code):
# nonlocal string
local_data["string"] += chr(char_code)
return _string
return _string
# Nestable state machine for UTF-8 Decoding.
def utf8_machine(byte_data, emit):
local_data = {"left": 0, "num": 0}
def _utf8(byte_data):
# nonlocal num, left
if (byte_data & 0xc0) != 0x80:
raise Exception("Invalid byte in UTF-8 character: 0x" + byte_data.toString(16))
local_data["left"] -= 1
local_data["num"] |= (byte_data & 0x3f) << (local_data["left"] * 6)
if local_data["left"]:
return _utf8
return emit(local_data["num"])
if 0xc0 <= byte_data < 0xe0: # 2-byte UTF-8 Character
local_data["left"] = 1
local_data["num"] = (byte_data & 0x1f) << 6
return _utf8
if 0xe0 <= byte_data < 0xf0: # 3-byte UTF-8 Character
local_data["left"] = 2
local_data["num"] = (byte_data & 0xf) << 12
return _utf8
if 0xf0 <= byte_data < 0xf8: # 4-byte UTF-8 Character
local_data["left"] = 3
local_data["num"] = (byte_data & 0x07) << 18
return _utf8
raise Exception("Invalid byte in UTF-8 string: 0x" + str(byte_data))
# Nestable state machine for hex escaped characters
def hex_machine(emit):
local_data = {"left": 4, "num": 0}
def _hex(byte_data):
# nonlocal num, left
i = 0 # Parse the hex byte
if 0x30 <= byte_data < 0x40:
i = byte_data - 0x30
elif 0x61 <= byte_data <= 0x66:
i = byte_data - 0x57
elif 0x41 <= byte_data <= 0x46:
i = byte_data - 0x37
else:
raise Exception("Expected hex char in string hex escape")
local_data["left"] -= 1
local_data["num"] |= i << (local_data["left"] * 4)
if local_data["left"]:
return _hex
return emit(local_data["num"])
return _hex
def number_machine(byte_data, emit):
local_data = {"sign": 1, "number": 0, "decimal": 0, "esign": 1, "exponent": 0}
def _mid(byte_data):
if byte_data == 0x2e: # .
return _decimal
return _later(byte_data)
def _number(byte_data):
# nonlocal number
if 0x30 <= byte_data < 0x40:
local_data["number"] = local_data["number"] * 10 + (byte_data - 0x30)
return _number
return _mid(byte_data)
def _start(byte_data):
if byte_data == 0x30:
return _mid
if 0x30 < byte_data < 0x40:
return _number(byte_data)
raise Exception("Invalid number: 0x" + byte_data.toString(16))
if byte_data == 0x2d: # -
local_data["sign"] = -1
return _start
def _decimal(byte_data):
# nonlocal decimal
if 0x30 <= byte_data < 0x40:
local_data["decimal"] = (local_data["decimal"] + byte_data - 0x30) / 10
return _decimal
return _later(byte_data)
def _later(byte_data):
if byte_data == 0x45 or byte_data == 0x65: # E e
return _esign
return _done(byte_data)
def _esign(byte_data):
# nonlocal esign
if byte_data == 0x2b: # +
return _exponent
if byte_data == 0x2d: # -
local_data["esign"] = -1
return _exponent
return _exponent(byte_data)
def _exponent(byte_data):
# nonlocal exponent
if 0x30 <= byte_data < 0x40:
local_data["exponent"] = local_data["exponent"] * 10 + (byte_data - 0x30)
return _exponent
return _done(byte_data)
def _done(byte_data):
value = local_data["sign"] * (local_data["number"] + local_data["decimal"])
if local_data["exponent"]:
value *= math.pow(10, local_data["esign"] * local_data["exponent"])
return emit(value, byte_data)
return _start(byte_data)
def array_machine(emit):
local_data = {"array_data": []}
def _array(byte_data):
if byte_data == 0x5d: # ]
return emit(local_data["array_data"])
return json_machine(on_value, _comma)(byte_data)
def on_value(value):
# nonlocal array_data
local_data["array_data"].append(value)
def _comma(byte_data):
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _comma # Ignore whitespace
if byte_data == 0x2c: # ,
return json_machine(on_value, _comma)
if byte_data == 0x5d: # ]
return emit(local_data["array_data"])
raise Exception("Unexpected byte: 0x" + str(byte_data) + " in array body")
return _array
def object_machine(emit):
local_data = {"object_data": {}, "key": ""}
def _object(byte_data):
# nonlocal object_data, key
if byte_data == 0x7d: #
return emit(local_data["object_data"])
return _key(byte_data)
def _key(byte_data):
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _object # Ignore whitespace
if byte_data == 0x22:
return string_machine(on_key)
raise Exception("Unexpected byte: 0x" + byte_data.toString(16))
def on_key(result):
# nonlocal object_data, key
local_data["key"] = result
return _colon
def _colon(byte_data):
# nonlocal object_data, key
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _colon # Ignore whitespace
if byte_data == 0x3a: # :
return json_machine(on_value, _comma)
raise Exception("Unexpected byte: 0x" + str(byte_data))
def on_value(value):
# nonlocal object_data, key
local_data["object_data"][local_data["key"]] = value
def _comma(byte_data):
# nonlocal object_data
if byte_data == 0x09 or byte_data == 0x0a or byte_data == 0x0d or byte_data == 0x20:
return _comma # Ignore whitespace
if byte_data == 0x2c: # ,
return _key
if byte_data == 0x7d: #
return emit(local_data["object_data"])
raise Exception("Unexpected byte: 0x" + str(byte_data))
return _object
Run Code Online (Sandbox Code Playgroud)
if __name__ == "__main__":
test_json = """[1,2,"3"] {"name":
"tarun"} 1 2
3 [{"name":"a",
"data": [1,
null,2]}]
"""
def found_json(data):
print(data)
state = json_machine(found_json)
for char in test_json:
state = state(ord(char))
Run Code Online (Sandbox Code Playgroud)
相同的输出是
[1, 2, '3']
{'name': 'tarun'}
1
2
3
[{'name': 'a', 'data': [1, None, 2]}]
Run Code Online (Sandbox Code Playgroud)
我想提供一个解决方案.关键思想是"尝试"解码:如果失败,给它更多的提要,否则使用偏移信息准备下一次解码.
然而,当前的json模块不能容忍要解码的字符串头部中的SPACE,所以我必须将它们剥离.
import sys
import json
def iterload(file):
buffer = ""
dec = json.JSONDecoder()
for line in file:
buffer = buffer.strip(" \n\r\t") + line.strip(" \n\r\t")
while(True):
try:
r = dec.raw_decode(buffer)
except:
break
yield r[0]
buffer = buffer[r[1]:].strip(" \n\r\t")
for o in iterload(sys.stdin):
print("Working on a", type(o), o)
Run Code Online (Sandbox Code Playgroud)
=========================我已经测试了几个txt文件,它工作正常.(in1.txt)
{"foo": ["bar", "baz"]
}
1 2 [
] 4
{"foo1": ["bar1", {"foo2":{"A":1, "B":3}, "DDD":4}]
}
5 6
Run Code Online (Sandbox Code Playgroud)
(in2.txt)
{"foo"
: ["bar",
"baz"]
}
1 2 [
] 4 5 6
Run Code Online (Sandbox Code Playgroud)
(in.txt,你的初始)
{"foo": ["bar", "baz"]} 1 2 [] 4 5 6
Run Code Online (Sandbox Code Playgroud)
(Benedict测试用例的输出)
python test.py < in.txt
('Working on a', <type 'list'>, [u'hello'])
('Working on a', <type 'dict'>, {u'goodbye': 1})
('Working on a', <type 'int'>, 1)
('Working on a', <type 'int'>, 2)
('Working on a', <type 'dict'>, {})
('Working on a', <type 'int'>, 2)
('Working on a', <type 'int'>, 9)
('Working on a', <type 'int'>, 78)
('Working on a', <type 'int'>, 4)
('Working on a', <type 'int'>, 5)
('Working on a', <type 'dict'>, {u'animals': [u'dog', u'lots of mice', u'cat']})
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
56004 次 |
| 最近记录: |