我想用 R 创建印度的分区统计图
我做的第一步是在 R 中导入形状文件
来自https://github.com/datameet/maps/tree/master/States
并在 R 中读取它
shape <- rgdal::readOGR(dsn="/Data/Admin2.shp")
states <- fortify(shape, region = "ST_NM")
Run Code Online (Sandbox Code Playgroud)
接下来我有一个州及其人口的数据集states_data
structure(list(Name = c("JAMMU & KASHMIR", "HIMACHAL PRADESH",
"UTTARAKHAND", "RAJASTHAN", "UTTAR PRADESH", "BIHAR", "SIKKIM",
"ARUNACHAL PRADESH", "NAGALAND", "MANIPUR", "MIZORAM", "TRIPURA",
"MEGHALAYA", "ASSAM", "WEST BENGAL", "JHARKHAND", "ODISHA", "CHHATTISGARH",
"MADHYA PRADESH", "GUJARAT", "DAMAN & DIU", "DADRA & NAGAR HAVELI",
"MAHARASHTRA", "ANDHRA PRADESH", "KARNATAKA", "GOA", "LAKSHADWEEP",
"KERALA", "TAMIL NADU", "ANDAMAN & NICOBAR ISLANDS"), TOT_P = c(1493299,
392126, 291903, 9238534, 1134273, 1336573, 206360, 951821, 1710973,
1167422, 1036115, 1166813, 2555861, 3884371, 5296953, 8645042,
9590756, 7822902, 15316784, 8917174, 15363, 178564, 10510213,
5918073, 4248987, 149275, 61120, 484839, 794697, 28530)), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)
我将两个数据集合并到州名上
final_data <- merge(states,states_data, by.y="Name", by.x="id")
Run Code Online (Sandbox Code Playgroud)
最后我使用 ggplot 进行绘图
ggplot()+
geom_polygon(data=final_data,
aes(x= long, y=lat, group=id, fill=TOT_P), color='black',size=0.25)+
coord_map()
Run Code Online (Sandbox Code Playgroud)
我得到下图
有人可以告诉我哪里出了问题吗?任何帮助表示赞赏!
谢谢!
两个数据集中的州名称字符串不相同。
如果您查看唯一值,您可以看到 shapefile 使用标题大小写
> unique(states$id)
[1] "Andaman & Nicobar Island" "Andhra Pradesh" "Arunanchal Pradesh" "Assam"
[5] "Bihar" "Chandigarh" "Chhattisgarh" "Dadara & Nagar Havelli"
[9] "Daman & Diu" "Goa" "Gujarat" "Haryana"
[13] "Himachal Pradesh" "Jammu & Kashmir" "Jharkhand" "Karnataka"
[17] "Kerala" "Lakshadweep" "Madhya Pradesh" "Maharashtra"
[21] "Manipur" "Meghalaya" "Mizoram" "Nagaland"
[25] "NCT of Delhi" "Odisha" "Puducherry" "Punjab"
[29] "Rajasthan" "Sikkim" "Tamil Nadu" "Telangana"
[33] "Tripura" "Uttar Pradesh" "Uttarakhand" "West Bengal"
Run Code Online (Sandbox Code Playgroud)
而您的人口数据框使用全部大写:
> unique(states_data$Name)
[1] "JAMMU & KASHMIR" "HIMACHAL PRADESH" "UTTARAKHAND" "RAJASTHAN"
[5] "UTTAR PRADESH" "BIHAR" "SIKKIM" "ARUNACHAL PRADESH"
[9] "NAGALAND" "MANIPUR" "MIZORAM" "TRIPURA"
[13] "MEGHALAYA" "ASSAM" "WEST BENGAL" "JHARKHAND"
[17] "ODISHA" "CHHATTISGARH" "MADHYA PRADESH" "GUJARAT"
[21] "DAMAN & DIU" "DADRA & NAGAR HAVELI" "MAHARASHTRA" "ANDHRA PRADESH"
[25] "KARNATAKA" "GOA" "LAKSHADWEEP" "KERALA"
[29] "TAMIL NADU" "ANDAMAN & NICOBAR ISLANDS"
Run Code Online (Sandbox Code Playgroud)
这就是您的合并数据集final_data为空的原因。
一种可能的解决方法是在合并之前将两个数据集中的名称转换为小写:
states$id <- stringr::str_to_lower(states$id)
states_data$Name <- stringr::str_to_lower(states_data$Name)
Run Code Online (Sandbox Code Playgroud)
但是,仍然有几行无法匹配,要么是因为拼写错误/拼写不同,要么只是缺少数据。你可以通过查看那些
setdiff(unique(states$id), unique(states_data$Name))
Run Code Online (Sandbox Code Playgroud)
并尽可能调整拼写。
最后,在我的快速测试中,强化多边形没有很好地绘制——这可能完全是我的 rgeos/rgdal/ggplot2 组合所特有的。不过,如果您打算更广泛地使用空间数据,我想向您推荐该sf软件包。它使处理空间数据变得极其方便(请参阅此处的综合文档),并且使您能够简单地使用geom_sf()进行绘图ggplot2。
library(tidyverse)
library(sf)
# read shape and convert state names to lower case
states <- st_read("./Data/Admin2.shp") %>%
mutate(Name = str_to_lower(ST_NM))
# merge spatial data with population data, also convert state names to lower case in the latter
states_population <- states %>%
left_join(states_data %>% mutate(Name = str_to_lower(Name)), "Name")
# grey states are the result of unmatched states outlined above
ggplot(states_population, aes(fill = TOT_P)) +
geom_sf() +
scale_fill_viridis_c() +
ggthemes::theme_map()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
310 次 |
| 最近记录: |