我正在从AWS(从javascript文件)中删除一些JSONP词典.在仅解析类似JSON数据的原始数据之后,在某些情况下,我获得了一个有效的JSON,并且可以在Python(json_data = json.loads(json_like_data))中成功加载它.但是,亚马逊的一些JSONP不会在其密钥周围包含引号(请参阅下文).
...
{type:"storageCurrentGen",sizes:
[{size:"i2.xlarge",vCPU:"4",ECU:"14",memoryGiB:"30.5",storageGB:"1 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"0.938"}}]},
{size:"i2.2xlarge",vCPU:"8",ECU:"27",memoryGiB:"61",storageGB:"2 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"1.876"}}]},
{size:"i2.4xlarge",vCPU:"16",ECU:"53",memoryGiB:"122",storageGB:"4 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"3.751"}}]},
...
Run Code Online (Sandbox Code Playgroud)
对于JSONP,这仍然有效,因为它是有效的JavaScript语法.但是,json.loads(json_str)由于它不是有效的JSON,所以Python很糟糕.
还有另一个Python模块YAML可以处理不带引号的键,但是在分号(:)之后必须有一个空格.
我认为我有两个选择.
{| ,)和冒号(:)之间的字符.然后用json.loads(...).:)之后添加一个空格.然后解析yaml.load(...).我的猜测是选项2优于1.然而,我正在寻求更好的解决方案的建议.
有没有人遇到像这样的格式错误的JSON,并使用Python来解析它?
Mar*_*ers 19
您可以安装和使用该hjson库 ; 它支持解析有效的JavaScript(缺少引号):
>>> import hjson
>>> hjson.loads('{javascript_style:"Look ma, no quotes!"}')
OrderedDict([('javascript_style', 'Look ma, no quotes!')])
Run Code Online (Sandbox Code Playgroud)
只有在设置demjson标志时才会strict=True拒绝解析输入:
import demjson
result = demjson.decode(jsonp_payload)
Run Code Online (Sandbox Code Playgroud)
使用正则表达式,您可以尝试使用正则表达式来获得有效的JSON; 然而,这可能导致误报.模式将是:
>>> import demjson
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}')
{u'javascript_style': u'Look ma, no quotes!'}
>>> demjson.decode('{javascript_style:"Look ma, no quotes!"}', strict=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 5701, in decode
return_stats=(return_stats or write_stats) )
File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/site-packages/demjson.py", line 4917, in decode
raise errors[0]
demjson.JSONDecodeError: ('JSON does not allow identifiers to be used as strings', u'javascript_style')
Run Code Online (Sandbox Code Playgroud)
这匹配一个demjson或{后跟一个JavaScript标识符(一个字符,后跟更多的字符或数字),然后直接跟,冒号.如果您的引用值包含任何此类模式,您将获得无效的JSON.
您也可以使用简单的Regex执行此操作(在此特定情况下):
ll = '{type:"storageCurrentGen",sizes:\n[{size:"i2.xlarge",vCPU:"4",ECU:"14",memoryGiB:"30.5",storageGB:"1 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"0.938"}}]},\n{size:"i2.2xlarge",vCPU:"8",ECU:"27",memoryGiB:"61",storageGB:"2 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"1.876"}}]},\n{size:"i2.4xlarge",vCPU:"16",ECU:"53",memoryGiB:"122",storageGB:"4 x 800 SSD",valueColumns:[{name:"linux",prices:{USD:"3.751"}}]},'
ll_patched = re.sub('([{,:])(\w+)([},:])','\\1\"\\2\"\\3',ll)
>>> ll_patched
'{"type":"storageCurrentGen","sizes":\n[{"size":"i2.xlarge","vCPU":"4","ECU":"14","memoryGiB":"30.5","storageGB":"1 x 800 SSD","valueColumns":[{"name":"linux","prices":{"USD":"0.938"}}]},\n{"size":"i2.2xlarge","vCPU":"8","ECU":"27","memoryGiB":"61","storageGB":"2 x 800 SSD","valueColumns":[{"name":"linux","prices":{"USD":"1.876"}}]},\n{"size":"i2.4xlarge","vCPU":"16","ECU":"53","memoryGiB":"122","storageGB":"4 x 800 SSD","valueColumns":[{"name":"linux","prices":{"USD":"3.751"}}]},'
Run Code Online (Sandbox Code Playgroud)