使用Pandas读取JSON时的"预期字符串或Unicode"

Bal*_*r82 6 python json openstreetmap pandas overpass-api

我尝试读取一个有效的Openstreetmaps API输出JSON字符串.

我使用以下代码:

import pandas as pd
import requests

# Links unten
minLat = 50.9549
minLon = 13.55232

# Rechts oben
maxLat = 51.1390
maxLon = 13.89873

osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)

osmdata = osm.json()

osmdataframe = pd.read_json(osmdata)
Run Code Online (Sandbox Code Playgroud)

抛出以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-304b7fbfb645> in <module>()
----> 1 osmdataframe = pd.read_json(osmdata)

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
    196         obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
    197                           keep_default_dates, numpy, precise_float,
--> 198                           date_unit).parse()
    199 
    200     if typ == 'series' or obj is None:

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
    264 
    265         else:
--> 266             self._parse_no_numpy()
    267 
    268         if self.obj is None:

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
    481         if orient == "columns":
    482             self.obj = DataFrame(
--> 483                 loads(json, precise_float=self.precise_float), dtype=None)
    484         elif orient == "split":
    485             decoded = dict((str(k), v)

TypeError: Expected String or Unicode
Run Code Online (Sandbox Code Playgroud)

如何修改请求或Pandas read_json,以避免错误?顺便问一下,问题是什么?

unu*_*tbu 13

如果您将json字符串打印到文件,

content = osm.read()
with open('/tmp/out', 'w') as f:
    f.write(content)
Run Code Online (Sandbox Code Playgroud)

你会看到这样的东西:

{
  "version": 0.6,
  "generator": "Overpass API",
  "osm3s": {
    "timestamp_osm_base": "2014-07-20T07:52:02Z",
    "copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
  },
  "elements": [

{
  "type": "node",
  "id": 536694,
  "lat": 50.9849256,
  "lon": 13.6821776,
  "tags": {
    "highway": "bus_stop",
    "name": "Niederhäslich Bergmannsweg"
  }
},
...]}
Run Code Online (Sandbox Code Playgroud)

如果要将JSON字符串转换为Python对象,那么它将是一个dict,其elements键是一个dicts列表.绝大多数数据都在这个词典列表中.

此JSON字符串不能直接转换为Pandas对象.什么是索引,列是什么?当然你不想[u'elements', u'version', u'osm3s', u'generator']成为专栏,因为几乎所有的信息都在elements列表中.

但是如果你想让DataFrame只包含在dic elements-list 中的数据,那么你必须指定,因为Pandas不能为你做出这样的假设.

更复杂的是每个字典elements都是嵌套的字典.考虑第一个词典elements:

{
  "type": "node",
  "id": 536694,
  "lat": 50.9849256,
  "lon": 13.6821776,
  "tags": {
    "highway": "bus_stop",
    "name": "Niederhäslich Bergmannsweg"
  }
}
Run Code Online (Sandbox Code Playgroud)

['lat', 'lon', 'type', 'id', 'tags']列应该是?这似乎是合理的,除了该tags列最终将成为一列dicts.这通常不是很有用.如果tags字典中的键被制成列,那也许会更好.我们可以这样做,但我们必须自己编码,因为熊猫无法知道我们想要的东西.


import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232

# Rechts oben
maxLat = 51.1390
maxLon = 13.89873

osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)

osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:
    for key, val in dct['tags'].iteritems():
        dct[key] = val
    del dct['tags']

osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())
Run Code Online (Sandbox Code Playgroud)

产量

         lat        lon                        name
0  50.984926  13.682178  Niederhäslich Bergmannsweg
1  51.123623  13.782789                Sagarder Weg
2  51.065752  13.895734     Weißig, Einkaufszentrum
3  51.007140  13.698498          Stuttgarter Straße
4  51.010199  13.701411          Heilbronner Straße
Run Code Online (Sandbox Code Playgroud)