地图,ggplot2,按州填写缺少地图上的某些区域

Ram*_*les 5 maps plot r ggplot2

我正在与不同年份的每个州的某些罪行进行可视化mapsggplot2可视化.我正在使用的数据集是由FBI制作的,可以从他们的网站或从这里下载(如果你不想下载数据集,我不会责怪你,但它太大了,不能复制和粘贴到这个问题,并包括一小部分数据集将无济于事,因为没有足够的信息来重新创建图表).

问题比描述的更容易看到.

国家抢劫

正如你所看到的,加利福尼亚州缺少一大块以及其他一些州.以下是生成此图的代码:

# load libraries
library(maps)
library(ggplot2)

# load data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
states <- map_data("state")

# merge data sets by region
fbi$region <- tolower(fbi$state)
fbimap <- merge(fbi, states, by="region")

# plot robbery numbers by state for year 2012
fbimap12 <- subset(fbimap, Year == 2012)
qplot(long, lat, geom="polygon", data=fbimap12,
  facets=~Year, fill=Robbery, group=group)
Run Code Online (Sandbox Code Playgroud)

这就是states数据的样子:

    long      lat     group order  region subregion
1 -87.46201 30.38968     1     1 alabama      <NA>
2 -87.48493 30.37249     1     2 alabama      <NA>
3 -87.52503 30.37249     1     3 alabama      <NA>
4 -87.53076 30.33239     1     4 alabama      <NA>
5 -87.57087 30.32665     1     5 alabama      <NA>
6 -87.58806 30.32665     1     6 alabama      <NA>
Run Code Online (Sandbox Code Playgroud)

这就是fbi数据的样子:

    Year Population Violent Property Murder Forcible.Rape Robbery
1 1960    3266740    6097    33823    406           281     898
2 1961    3302000    5564    32541    427           252     630
3 1962    3358000    5283    35829    316           218     754
4 1963    3347000    6115    38521    340           192     828
5 1964    3407000    7260    46290    316           397     992
6 1965    3462000    6916    48215    395           367     992
   Aggravated.Assault Burglary Larceny.Theft Vehicle.Theft abbr   state region
1               4512    11626         19344          2853   AL Alabama  alabama
2               4255    11205         18801          2535   AL Alabama  alabama
3               3995    11722         21306          2801   AL Alabama  alabama
4               4755    12614         22874          3033   AL Alabama  alabama
5               5555    15898         26713          3679   AL Alabama  alabama
6               5162    16398         28115          3702   AL Alabama  alabama
Run Code Online (Sandbox Code Playgroud)

然后我合并了两套region.我试图绘制的子集是

      region Year Robbery      long      lat group
8283 alabama 2012    5020 -87.46201 30.38968     1
8284 alabama 2012    5020 -87.48493 30.37249     1
8285 alabama 2012    5020 -87.95475 30.24644     1
8286 alabama 2012    5020 -88.00632 30.24071     1
8287 alabama 2012    5020 -88.01778 30.25217     1
8288 alabama 2012    5020 -87.52503 30.37249     1
       ...            ...    ...      ...
Run Code Online (Sandbox Code Playgroud)

关于如何在没有那些丑陋的缺失点的情况下创建这个情节的任何想法?

jaz*_*rro 8

我玩了你的代码.我可以说的一件事是,当你使用merge了发生的事情.我绘制状态图geom_path并使用并确认原始地图数据中不存在一些奇怪的线.然后,我通过玩merge和来进一步调查这个案例inner_join.mergeinner_join在这里做同样的工作.但是,我发现了一个区别.当我使用时merge,订单改变了; 数字不是正确的顺序.事实并非如此inner_join.您将在下面看到加利福尼亚州的一些数据.你的方法是对的.但merge不知为何对你有利.不过,我不确定为什么函数改变了顺序.

library(dplyr)

### Call US map polygon
states <- map_data("state")

### Get crime data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
fbi$state <- tolower(fbi$state)


### Check if both files have identical state names: The answer is NO
### states$region does not have Alaska, Hawaii, and Washington D.C.
### fbi$state does not have District of Columbia.

setdiff(fbi$state, states$region)
#[1] "alaska"           "hawaii"           "washington d. c."

setdiff(states$region, fbi$state)
#[1] "district of columbia"

### Select data for 2012 and choose two columns (i.e., state and Robbery)
fbi2 <- fbi %>%
        filter(Year == 2012) %>%
        select(state, Robbery)  
Run Code Online (Sandbox Code Playgroud)

现在我用merge和创建了两个数据框inner_join.

### Create two data frames with merge and inner_join
ana <- merge(fbi2, states, by.x = "state", by.y = "region")
bob <- inner_join(fbi2, states, by = c("state" ="region"))

ana %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -119.8685 38.90956     4   676      <NA>
#2  california   56521 -119.5706 38.69757     4   677      <NA>
#3  california   56521 -119.3299 38.53141     4   678      <NA>
#4  california   56521 -120.0060 42.00927     4   667      <NA>
#5  california   56521 -120.0060 41.20139     4   668      <NA>

bob %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -120.0060 42.00927     4   667      <NA>
#2  california   56521 -120.0060 41.20139     4   668      <NA>
#3  california   56521 -120.0060 39.70024     4   669      <NA>
#4  california   56521 -119.9946 39.44241     4   670      <NA>
#5  california   56521 -120.0060 39.31636     4   671      <NA>

ggplot(data = bob, aes(x = long, y = lat, fill = Robbery, group = group)) +
geom_polygon()
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述