hk2*_*hk2 4 grouping r substitution gsub
我有一个非常大的数据集,并且其中的一个样本看起来类似于以下内容:
| Id | Name | Start_Date | End_Date |
|----|---------|------------|------------|
| 10 | Mark | 4/2/1999 | 7/5/2018 |
| 10 | | 1/1/2000 | 9/24/2018 |
| 25 | | 5/3/1968 | 6/3/2000 |
| 25 | | 6/6/2009 | 4/23/2010 |
| 25 | Anthony | 2/20/2010 | 7/21/2016 |
| 25 | | 9/12/2014 | 11/26/2019 |
Run Code Online (Sandbox Code Playgroud)
我需要Name根据它们的名称来解析列中的名称,以Id使输出表如下所示:
| Id | Name | Start_Date | End_Date |
|----|---------|------------|------------|
| 10 | Mark | 4/2/1999 | 7/5/2018 |
| 10 | Mark | 1/1/2000 | 9/24/2018 |
| 25 | Anthony | 5/3/1968 | 6/3/2000 |
| 25 | Antony | 6/6/2009 | 4/23/2010 |
| 25 | Anthony | 2/20/2010 | 7/21/2016 |
| 25 | Anthony | 9/12/2014 | 11/26/2019 |
Run Code Online (Sandbox Code Playgroud)
如何获得如上所述的输出?我经历了替换和解析功能,但无法理解它们如何应用于此问题。
我的数据集将是:
df=data.frame(Id=c("10","10","25","25","25","25"),Name=c("Mark","","","","Anthony",""),
Start_Date=c("4/2/1999", "1/1/2000","5/3/1968","6/6/2009","2/20/2010","9/12/2014"),
End_Date=c("7/5/2018","9/24/2018","6/3/2000","4/23/2010","7/21/2016","11/26/2019"))
Run Code Online (Sandbox Code Playgroud)
我们可以将空格("")更改为,NA并用于fill将NA元素替换为先前的非NA元素
library(dplyr)
library(tidyr)
df1 %>%
mutate(Name = na_if(Name, "")) %>%
group_by(Id) %>%
fill(Name, .direction = "down") %>%
fill(Name, .direction = "up)
# A tibble: 6 x 4
# Groups: Id [2]
# Id Name Start_Date End_Date
# <chr> <chr> <chr> <chr>
#1 10 Mark 4/2/1999 7/5/2018
#2 10 Mark 1/1/2000 9/24/2018
#3 25 Anthony 5/3/1968 6/3/2000
#4 25 Anthony 6/6/2009 4/23/2010
#5 25 Anthony 2/20/2010 7/21/2016
#6 25 Anthony 9/12/2014 11/26/2019
Run Code Online (Sandbox Code Playgroud)
在()devel版本中,这也可以在单个语句中完成,这也是一种选择tidyr‘0.8.3.9000’fill.direction = "downup"
df1 %>%
mutate(Name = na_if(Name, "")) %>%
group_by(Id) %>%
fill(Name, .direction = "downup")
Run Code Online (Sandbox Code Playgroud)
或另一种选择是按“ Id”分组,并将mutate“名称”作为first非空白元素
df1 %>%
group_by(Id) %>%
mutate(Name = first(Name[Name!=""]))
# A tibble: 6 x 4
# Groups: Id [2]
# Id Name Start_Date End_Date
# <chr> <chr> <chr> <chr>
#1 10 Mark 4/2/1999 7/5/2018
#2 10 Mark 1/1/2000 9/24/2018
#3 25 Anthony 5/3/1968 6/3/2000
#4 25 Anthony 6/6/2009 4/23/2010
#5 25 Anthony 2/20/2010 7/21/2016
#6 25 Anthony 9/12/2014 11/26/2019
Run Code Online (Sandbox Code Playgroud)
df1 <- structure(list(Id = c("10", "10", "25", "25", "25", "25"), Name = c("Mark",
"", "", "", "Anthony", ""), Start_Date = c("4/2/1999", "1/1/2000",
"5/3/1968", "6/6/2009", "2/20/2010", "9/12/2014"), End_Date = c("7/5/2018",
"9/24/2018", "6/3/2000", "4/23/2010", "7/21/2016", "11/26/2019"
)), class = "data.frame", row.names = c(NA, -6L))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
50 次 |
| 最近记录: |