我有包含电子邮件地址和美国各州的数据,我想创建一个列来标记每个州所在的美国人口普查区域。在 SQL 中,我通过 case 语句完成了此操作,但在 R 中最好的方法是什么?
样本数据:
df <- data.frame(emails=c("xyz@gmail.com","abc@hotmail.com","bba@gmai.com","so121@gamil.com","ad@yahoo.com"), states=c("NV","CA","UT","AZ","IA"))
Run Code Online (Sandbox Code Playgroud)
示例数据如下所示:
emails states
xyz@gmail.com NV
abc@hotmail.com CA
bba@gmai.com UT
so121@gamil.com AZ
ad@yahoo.com IA
Run Code Online (Sandbox Code Playgroud)
我希望结果是:
emails states regions
xyz@gmail.com NV West
abc@hotmail.com CA West
bba@gmai.com UT West
so121@gamil.com AZ West
ad@yahoo.com IA Midwest
Run Code Online (Sandbox Code Playgroud)
像往常一样,最困难的部分是首先收集数据,但我碰巧从美国人口普查中存档了这些数据。因此,在运行下面的“州/地区数据”部分后,运行以下代码行:
df <- data.frame(emails=c("xyz@gmail.com","abc@hotmail.com","bba@gmai.com",
"so121@gamil.com","ad@yahoo.com"),
states=c("NV","CA","UT","AZ","IA"))
df$regions <- sapply(df$states,
function(x) names(region.list)[grep(x,region.list)])
#Then write to desktop, for example, with:
write.csv(df,"~/Desktop/nameHere.csv",row.names=FALSE)
Run Code Online (Sandbox Code Playgroud)
输出:
emails states regions
1 xyz@gmail.com NV West
2 abc@hotmail.com CA West
3 bba@gmai.com UT West
4 so121@gamil.com AZ West
5 ad@yahoo.com IA Midwest
Run Code Online (Sandbox Code Playgroud)
州/地区数据:
NE.name <- c("Connecticut","Maine","Massachusetts","New Hampshire",
"Rhode Island","Vermont","New Jersey","New York",
"Pennsylvania")
NE.abrv <- c("CT","ME","MA","NH","RI","VT","NJ","NY","PA")
NE.ref <- c(NE.name,NE.abrv)
MW.name <- c("Indiana","Illinois","Michigan","Ohio","Wisconsin",
"Iowa","Kansas","Minnesota","Missouri","Nebraska",
"North Dakota","South Dakota")
MW.abrv <- c("IN","IL","MI","OH","WI","IA","KS","MN","MO","NE",
"ND","SD")
MW.ref <- c(MW.name,MW.abrv)
S.name <- c("Delaware","District of Columbia","Florida","Georgia",
"Maryland","North Carolina","South Carolina","Virginia",
"West Virginia","Alabama","Kentucky","Mississippi",
"Tennessee","Arkansas","Louisiana","Oklahoma","Texas")
S.abrv <- c("DE","DC","FL","GA","MD","NC","SC","VA","WV","AL",
"KY","MS","TN","AR","LA","OK","TX")
S.ref <- c(S.name,S.abrv)
W.name <- c("Arizona","Colorado","Idaho","New Mexico","Montana",
"Utah","Nevada","Wyoming","Alaska","California",
"Hawaii","Oregon","Washington")
W.abrv <- c("AZ","CO","ID","NM","MT","UT","NV","WY","AK","CA",
"HI","OR","WA")
W.ref <- c(W.name,W.abrv)
region.list <- list(
Northeast=NE.ref,
Midwest=MW.ref,
South=S.ref,
West=W.ref)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4566 次 |
| 最近记录: |