V_s*_*qrt 8 python plot pandas bokeh geopandas
我需要在地理图上绘制一些数据。具体来说,我想强调数据来自的国家和州。我的数据集是
Year Country State/City
0 2009 BGR Sofia
1 2018 BHS New Providence
2 2002 BLZ NaN
3 2000 CAN California
4 2002 CAN Ontario
... ... ... ...
250 2001 USA Ohio
251 1998 USA New York
252 1995 USA Virginia
253 2011 USA NaN
254 2019 USA New York
Run Code Online (Sandbox Code Playgroud)
为了创建地理图,我一直在使用geopandas如下:
import geopandas as gpd
shapefile = 'path/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
Run Code Online (Sandbox Code Playgroud)
然后我合并了两个数据集:
merged = gdf.merge(df, left_on = 'country_code', right_on = 'Country')
Run Code Online (Sandbox Code Playgroud)
并将数据转换为json:
import json
merged_json = json.loads(merged.to_json())
#Convert to String like object.
json_data = json.dumps(merged_json)
Run Code Online (Sandbox Code Playgroud)
最后,我尝试按如下方式创建图表:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)
p = figure(title = 'Creation year across countries', plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :'per_cent_year', 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
output_notebook()
#Display figure.
show(p)
Run Code Online (Sandbox Code Playgroud)
当我运行它时,它说 BokehJS 1.0.2 successfully loaded。但它不显示任何内容。我的预期输出将是一张地图,其中颜色基于一个国家的出现次数(例如,美国 = 5 将更暗),另一张基于州/城市(纽约将更暗)。
上面的代码有什么问题吗?
(如果需要,很乐意分享更多数据/信息)
从您发布的代码中,我看不出绘图有任何问题,因此我认为问题可能出在数据聚合或合并中的某个地方。
这是一个解决方案,首先生成应与您的数据类似的数据,然后计算一个国家/地区在数据中出现的次数作为数据集大小的比例,因为这是所需的指标。我们将重点仅使用几个国家/地区作为示例:
from random import choices
import pandas as pd
import numpy as np
def generate_data():
k = 100
countries_of_interest = ['USA','ARG','BRA','GBR','ESP','RUS']
countries = choices(countries_of_interest, k=k)
start_yr = 2010
end_yr = 2021
return pd.DataFrame({'Country':countries,
'Year':np.random.randint(start_yr, end_yr, k)},
index=range(len(countries)))
def aggregate_data(df):
data = df.groupby('Country').agg('count')*100.0/len(df)
data = data.reset_index().rename(columns={'Year':'proportion_of_dataset'})
return data
df = generate_data()
# Country Year
# 0 USA 2017
# 1 GBR 2014
# 2 USA 2013
# 3 BRA 2016
# 4 BRA 2018
# .. ... ...
# 95 ESP 2014
# 96 USA 2015
# 97 RUS 2019
# 98 RUS 2012
# 99 RUS 2011
#
# [100 rows x 2 columns]
data = aggregate_data(df)
# Country proportion_of_dataset
# 0 ARG 20.0
# 1 BRA 17.0
# 2 ESP 14.0
# 3 GBR 14.0
# 4 RUS 19.0
# 5 USA 16.0
Run Code Online (Sandbox Code Playgroud)
现在使用 geopandas 加载国家边界形状文件,并重命名列:
import geopandas as gpd
shapefile = 'path_to_shapfile_folder/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp'
gdf = gpd.read_file(shapefile)[['ADMIN', 'ADM0_A3', 'geometry']]
gdf.columns = ['country', 'country_code', 'geometry']
gdf.head()
# country country_code \
# 0 Fiji FJI
# 1 United Republic of Tanzania TZA
# 2 Western Sahara SAH
# 3 Canada CAN
# 4 United States of America USA
#
# geometry
# 0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
# 1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
# 2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
# 3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
# 4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
Run Code Online (Sandbox Code Playgroud)
现在我们要将国家/地区多边形数据框与聚合数据合并。注意:我们想要进行左连接(在完整的国家/地区多边形数据帧上),以便包含所有国家/地区,甚至是我们没有数据的国家/地区。另请注意,我们通过用零填充 NaN 来添加这些国家/地区的缺失值:
merged = gdf.merge(data, left_on = 'country_code', right_on = 'Country', how='left')
merged['proportion_of_dataset'] = merged['proportion_of_dataset'].fillna(0)
Run Code Online (Sandbox Code Playgroud)
使用您的代码创建 geojson:
import json
merged_json = json.loads(merged.to_json())
json_data = json.dumps(merged_json)
Run Code Online (Sandbox Code Playgroud)
最后,我们将绘图代码放入一个函数中,并传入 geojson、要绘图的列和绘图标题作为参数:
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
def plot_map(json_data,plot_col,title):
geosource = GeoJSONDataSource(geojson = json_data)
#Define a sequential multi-hue color palette.
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal', major_label_overrides = tick_labels)
p = figure(title = title, plot_height = 600 , plot_width = 950, toolbar_location = None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs','ys', source = geosource,fill_color = {'field' :plot_col, 'transform' : color_mapper},
line_color = 'black', line_width = 0.25, fill_alpha = 1)
p.add_layout(color_bar, 'below')
output_notebook()
#Display figure.
show(p)
Run Code Online (Sandbox Code Playgroud)
现在我们要做的就是调用绘图函数,传入所需的参数:
plot_map(json_data,'proportion_of_dataset','Dataset countries of origin')
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1344 次 |
| 最近记录: |