添加列以按美国人口普查区域标记美国各州

sim*_*sim 1 r

我有包含电子邮件地址和美国各州的数据,我想创建一个列来标记每个州​​所在的美国人口普查区域。在 SQL 中,我通过 case 语句完成了此操作,但在 R 中最好的方法是什么?

样本数据:

df <- data.frame(emails=c("xyz@gmail.com","abc@hotmail.com","bba@gmai.com","so121@gamil.com","ad@yahoo.com"), states=c("NV","CA","UT","AZ","IA"))
Run Code Online (Sandbox Code Playgroud)

示例数据如下所示:

emails        states
xyz@gmail.com    NV      
abc@hotmail.com  CA      
bba@gmai.com     UT       
so121@gamil.com  AZ       
ad@yahoo.com     IA       
Run Code Online (Sandbox Code Playgroud)

我希望结果是:

emails           states  regions
xyz@gmail.com    NV      West
abc@hotmail.com  CA      West
bba@gmai.com     UT      West
so121@gamil.com  AZ      West
ad@yahoo.com     IA      Midwest
Run Code Online (Sandbox Code Playgroud)

www*_*www 5

像往常一样,最困难的部分是首先收集数据,但我碰巧从美国人口普查中存档了这些数据。因此,在运行下面的“州/地区数据”部分后,运行以下代码行:

df <- data.frame(emails=c("xyz@gmail.com","abc@hotmail.com","bba@gmai.com",
                          "so121@gamil.com","ad@yahoo.com"),
                 states=c("NV","CA","UT","AZ","IA"))

df$regions <- sapply(df$states, 
                 function(x) names(region.list)[grep(x,region.list)])

#Then write to desktop, for example, with:
write.csv(df,"~/Desktop/nameHere.csv",row.names=FALSE)
Run Code Online (Sandbox Code Playgroud)

输出:

           emails states regions
1   xyz@gmail.com     NV    West
2 abc@hotmail.com     CA    West
3    bba@gmai.com     UT    West
4 so121@gamil.com     AZ    West
5    ad@yahoo.com     IA Midwest
Run Code Online (Sandbox Code Playgroud)

州/地区数据:

NE.name <- c("Connecticut","Maine","Massachusetts","New Hampshire",
             "Rhode Island","Vermont","New Jersey","New York",
             "Pennsylvania")
NE.abrv <- c("CT","ME","MA","NH","RI","VT","NJ","NY","PA")
NE.ref <- c(NE.name,NE.abrv)

MW.name <- c("Indiana","Illinois","Michigan","Ohio","Wisconsin",
             "Iowa","Kansas","Minnesota","Missouri","Nebraska",
             "North Dakota","South Dakota")
MW.abrv <- c("IN","IL","MI","OH","WI","IA","KS","MN","MO","NE",
             "ND","SD")
MW.ref <- c(MW.name,MW.abrv)

S.name <- c("Delaware","District of Columbia","Florida","Georgia",
            "Maryland","North Carolina","South Carolina","Virginia",
            "West Virginia","Alabama","Kentucky","Mississippi",
            "Tennessee","Arkansas","Louisiana","Oklahoma","Texas")
S.abrv <- c("DE","DC","FL","GA","MD","NC","SC","VA","WV","AL",
            "KY","MS","TN","AR","LA","OK","TX")
S.ref <- c(S.name,S.abrv)

W.name <- c("Arizona","Colorado","Idaho","New Mexico","Montana",
            "Utah","Nevada","Wyoming","Alaska","California",
            "Hawaii","Oregon","Washington")
W.abrv <- c("AZ","CO","ID","NM","MT","UT","NV","WY","AK","CA",
            "HI","OR","WA")
W.ref <- c(W.name,W.abrv)

region.list <- list(
  Northeast=NE.ref,
  Midwest=MW.ref,
  South=S.ref,
  West=W.ref)
Run Code Online (Sandbox Code Playgroud)