jal*_*pic 1 r reshape2 dplyr tidyr
我在数据框中有不整齐的数据,看起来像这样.
在这里你可以在'团队'中看到一些足球队的名字.Name1-3是变量,列出了第一列中用于引用这些团队的不同名称.
team name1 name2 name3
1 Loughborough Loughborough
2 Luton Town Luton Town Luton
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5 Manchester City Manchester City Man City
6 Manchester United Manchester United Newton Heath Man United
7 Mansfield Town Mansfield Town Mansfield
8 Merthyr Town Merthyr Town
Run Code Online (Sandbox Code Playgroud)
我的目标是使用team-name1,team-name2,team-name3配对将数据分成2列.我只想保留那些配对,其中有name1,name2或name3中的数据.
要做到这一点,我正在尝试tidyr的 - gather()
temp <- dat %>% gather(key, value, 2:4)
temp$key<-NULL
temp
Run Code Online (Sandbox Code Playgroud)
这给出了以下输出:
team value
1 Loughborough Loughborough
2 Luton Town Luton Town
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5 Manchester City Manchester City
6 Manchester United Manchester United
7 Mansfield Town Mansfield Town
8 Merthyr Town Merthyr Town
9 Loughborough
10 Luton Town Luton
11 Macclesfield
12 Maidstone United
13 Manchester City Man City
14 Manchester United Newton Heath
15 Mansfield Town Mansfield
16 Merthyr Town
17 Loughborough
18 Luton Town
19 Macclesfield
20 Maidstone United
21 Manchester City
22 Manchester United Man United
23 Mansfield Town
24 Merthyr Town
Run Code Online (Sandbox Code Playgroud)
我试图删除不完整的案例(例如行20,21,23,24但不是22),使用:
temp[complete.cases(temp),]
Run Code Online (Sandbox Code Playgroud)
这不起作用,因为看似空的值观察包含一个字符"" - 我想这是如何gather()返回丢失的数据?我尝试转换temp$value为一个因素,但这也不起作用.
我很想听听如何摆脱不完整的案件.
样本数据...
dat<-structure(list(team = structure(1:8, .Label = c("Loughborough",
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City",
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"),
name1 = structure(1:8, .Label = c("Loughborough", "Luton Town",
"Macclesfield", "Maidstone United", "Manchester City", "Manchester United",
"Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L,
2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City",
"Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team",
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
您还可以添加filter(以便删除空白)和select(为了删除key列)从dplyr包中获取所有内容
temp <- dat %>%
gather(key, value, 2:4) %>%
filter(value != "") %>%
select(-key)
# team value
# 1 Loughborough Loughborough
# 2 Luton Town Luton Town
# 3 Macclesfield Macclesfield
# 4 Maidstone United Maidstone United
# 5 Manchester City Manchester City
# 6 Manchester United Manchester United
# 7 Mansfield Town Mansfield Town
# 8 Merthyr Town Merthyr Town
# 9 Luton Town Luton
# 10 Manchester City Man City
# 11 Manchester United Newton Heath
# 12 Mansfield Town Mansfield
# 13 Manchester United Man United
Run Code Online (Sandbox Code Playgroud)