dyl*_*njm 1 r dplyr tidyr purrr tibble
假设我有一个文件路径向量,该向量路径已被分割"/"并放入数据帧中。这些文件路径具有不同的长度,但是到了一天结束时,我希望所有基本名称都在同一列中排列。我在下面提供了我的意思和所需输出的示例。
library(tidyverse)
dat <- tibble(
V1 = rep("run1", 5),
V2 = rep("ox", 5),
V3 = c("performance.csv", "analysis", "analysis", "performance.csv", "analysis"),
V4 = c("", "rod1", "rod2", "rod3", "performance.csv"),
V5 = c("", "performance.csv", "performance.csv", "performance.csv", "")
)
dat
#> # A tibble: 5 x 5
#> V1 V2 V3 V4 V5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 run1 ox performance.csv "" ""
#> 2 run1 ox analysis rod1 performance.csv
#> 3 run1 ox analysis rod2 performance.csv
#> 4 run1 ox performance.csv rod3 performance.csv
#> 5 run1 ox analysis performance.csv ""
output <- tibble(
V1 = rep("run1", 5),
V2 = rep("ox", 5),
V3 = c("", "analysis", "analysis", "", "analysis"),
V4 = c("", "rod1", "rod1", "rod2", ""),
V5 = c("performance.csv", "performance.csv", "performance.csv", "performance.csv", "performance.csv")
)
output
#> # A tibble: 5 x 5
#> V1 V2 V3 V4 V5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 run1 ox "" "" performance.csv
#> 2 run1 ox analysis rod1 performance.csv
#> 3 run1 ox analysis rod1 performance.csv
#> 4 run1 ox "" rod2 performance.csv
#> 5 run1 ox analysis "" performance.csv
Run Code Online (Sandbox Code Playgroud)
我的想法是诉诸于for循环,在该循环中,我检查一列是否包含基本名称,如果包含,请替换为基名称并将其""移至最后一列。我在形成这种逻辑时遇到困难,并且知道必须有一种更好的方法来利用tidyverse。
创建一个函数rearrange,该函数重新排列行,将基名放在最后,如果它的末尾还没有,则将其原始位置清空。我们假定任何带点的条目都是基本名称。然后应用于rearrange每一行。
rearrange <- function(x) {
i <- grep(".", x, fixed = TRUE)[1]
x[length(x)] <- x[i]
if (i < length(x)) x[i] <- ""
x
}
as_tibble(t(apply(dat, 1, rearrange)))
Run Code Online (Sandbox Code Playgroud)
给予:
# A tibble: 5 x 5
V1 V2 V3 V4 V5
<chr> <chr> <chr> <chr> <chr>
1 run1 ox "" "" performance.csv
2 run1 ox analysis rod1 performance.csv
3 run1 ox analysis rod2 performance.csv
4 run1 ox "" rod3 performance.csv
5 run1 ox analysis "" performance.csv
Run Code Online (Sandbox Code Playgroud)