Man*_*uel 15 error-handling r dplyr
我是R的新手,没有为我的问题找到解决方案.我真的希望你能帮助我.
虽然有更多的列和观察,但我的数据框如下所示:
dt <- data.frame(hid = c(1, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4),
syear = c(2000, 2001, 2003, 2003, 2003, 2000, 2000, 2001, 2001, 2002, 2002),
employlvl = c("Full-time", "Part-time", "Part-time", "Unemployed", "Unemployed",
"Full-time", "Full-time", "Full-time", "Unemployed", "Part-time",
"Full-time"),
relhead = c("Head", "Head", "Head", "Partner", "other", "Head",
"Partner", "Head", "Partner", "Head", "Partner"))
Run Code Online (Sandbox Code Playgroud)
| hid | syear | employlvl | relhead |
|-----|-------|-------------|-----------------------|
| 1 | 2000 | Full-time | Head |
| 2 | 2001 | Part-time | Head |
| 2 | 2003 | Part-time | Head |
| 2 | 2003 | Unemployed | Partner |
| 2 | 2003 | Unemployed | other |
| 4 | 2000 | Full-time | Head |
| 4 | 2000 | Full-time | Partner |
| 4 | 2001 | Full-time | Head |
| 4 | 2001 | Unemployed | Partner |
| 4 | 2002 | Part-time | Head |
| 4 | 2002 | Full-time | Partner |
Run Code Online (Sandbox Code Playgroud)
我想创建另一个列,表明合作伙伴的就业水平,并希望获得以下输出:
| hid | syear | employlvl | relhead | Partner |
|-----|-------|-------------|-----------------------|-------------------|
| 1 | 2000 | Part-time | Head | NA |
| 2 | 2001 | Part-time | Head | NA |
| 2 | 2003 | Part-time | Head | Unemployed |
| 2 | 2003 | Unemployed | Partner | NA |
| 2 | 2003 | Unemployed | other | NA |
| 4 | 2000 | Full-time | Head | Full-time |
| 4 | 2000 | Full-time | Partner | NA |
| 4 | 2001 | Full-time | Head | Unemployed |
| 4 | 2001 | Unemployed | Partner | NA |
| 4 | 2002 | Part-time | Head | Full-time |
| 4 | 2002 | Full-time | Partner | NA |
Run Code Online (Sandbox Code Playgroud)
目前我正在使用以下代码.(再次感谢用户ycw)
library(dplyr)
library(tidyr)
dt2 <- dt %>%
group_by(hid, syear) %>%
filter(n() > 1) %>%
filter(`relhead` != "Child") %>%
spread(relhead, employlvl) %>%
mutate(Relation = "Head") %>%
rename(`Employment Partner` = Partner) %>%
select(-Head)
dt3 <- dt %>%
left_join(dt2, by = c("hid", "syear", "relhead" = "Relation"))
Run Code Online (Sandbox Code Playgroud)
该代码对于这个小数据集非常好.但是一旦我尝试了我的整个数据,我得到以下内容:
Error: Data source must be a dictionary
Run Code Online (Sandbox Code Playgroud)
非常感谢你的帮助.
如其他答案中所述,这是由非唯一名称引起的.我能够通过修改你的例子来重现错误(第三个元素relhead)
dt <- data.frame(
hid = c(1, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4),
syear = c(2000, 2001, 2003, 2003, 2003, 2000, 2000, 2001, 2001, 2002, 2002),
employlvl = c("Full-time", "Part-time", "Part-time", "Unemployed", "Unemployed",
"Full-time", "Full-time", "Full-time", "Unemployed", "Part-time",
"Full-time"),
relhead = c("Head", "Head", "Employment Partner", "Partner", "other", "Head",
"Partner", "Head", "Partner", "Head", "Partner")
)
Run Code Online (Sandbox Code Playgroud)
在这种情况下,spread创建第一"Employment Partner"列并rename创建第二列.你应该检查是否有任何的"Employment Partner","Relation"(也许hid,syear)是在dt$relhead(第一个为您提供了错误,第二个是覆盖由mutate(Relation=...)).
最小可重复的例子:
data_frame(g = c("a1","a2","a3"), i=1) %>%
spread(g, i) %>%
rename(a1 = a3) %>%
select(-a1)
Run Code Online (Sandbox Code Playgroud)