我有一个具有以下结构的数据框,标题为“final_proj_data”
ID County Population Year
<dbl> <chr> <dbl> <dbl>
1003 Baldwin County, Alabama 169162 2006
1015 Calhoun County, Alabama 112903 2006
1043 Cullman County, Alabama 80187 2006
1049 DeKalb County, Alabama 68014 2006
Run Code Online (Sandbox Code Playgroud)
我试图将“县”列拆分为两个不同的列“县”和“州”,并删除逗号。
我尝试了 split() 函数的多种排列,但我不断收到此错误:
错误:
var必须计算为单个数字或列名称,而不是字符向量
我已经尝试过(除其他外)
final_proj_data %>%
separate(final_proj_data$County, c("State", "County"), sep = ",", remove = TRUE)
final_proj_data %>%
separate(data = final_proj_data, col = County,
into = c("State", "County"), sep = ",")
Run Code Online (Sandbox Code Playgroud)
我不确定我做错了什么,或者为什么“col =”不断抛出此错误。任何帮助,将不胜感激!
使用dplyr和基础 R:
library(dplyr)
final_proj_data %>%
mutate(State=unlist(lapply(strsplit(County,", "),function(x) x[2])),
County=gsub(",.*","",County))
ID County Population Year State
1 1003 Baldwin County 169162 2006 Alabama
2 1015 Calhoun County 112903 2006 Alabama
3 1043 Cullman County 80187 2006 Alabama
4 1049 DeKalb County 68014 2006 Alabama
Run Code Online (Sandbox Code Playgroud)
原来的:
和(刚刚看到@Ronak Shah 上面也有同样的评论)dplyr:tidyr
library(dplyr)
library(tidyr)
final_proj_data %>%
separate(County,c("County","State"),sep=",")
ID County State Population Year
1 1003 Baldwin County Alabama 169162 2006
2 1015 Calhoun County Alabama 112903 2006
3 1043 Cullman County Alabama 80187 2006
4 1049 DeKalb County Alabama 68014 2006
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5050 次 |
| 最近记录: |