我正在处理这样的数据:
> df
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 1 0 0 0 1 1 0 0 1 1
2 1 1 1 0 0 0 0 1 0 1
3 1 1 0 0 1 0 0 1 0 1
4 1 0 0 0 0 0 0 1 1 1
5 0 0 0 1 0 0 1 1 1 1
6 0 0 1 1 0 0 1 0 1 0
Run Code Online (Sandbox Code Playgroud)
dput(df) 如下
df <- structure(list(V1 = c(1, 1, 1, 1, 0, 0), V2 = c(0, 1, 1, 0, 0,
0), V3 = c(0, 1, 0, 0, 0, 1), V4 = c(0, 0, 0, 0, 1, 1), V5 = c(1,
0, 1, 0, 0, 0), V6 = c(1, 0, 0, 0, 0, 0), V7 = c(0, 0, 0, 0,
1, 1), V8 = c(0, 1, 1, 1, 1, 0), V9 = c(1, 0, 0, 1, 1, 1), V10 = c(1,
1, 1, 1, 1, 0)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"), spec = structure(list(cols = list(V1 = structure(list(), class = c("collector_double",
"collector")), V2 = structure(list(), class = c("collector_double",
"collector")), V3 = structure(list(), class = c("collector_double",
"collector")), V4 = structure(list(), class = c("collector_double",
"collector")), V5 = structure(list(), class = c("collector_double",
"collector")), V6 = structure(list(), class = c("collector_double",
"collector")), V7 = structure(list(), class = c("collector_double",
"collector")), V8 = structure(list(), class = c("collector_double",
"collector")), V9 = structure(list(), class = c("collector_double",
"collector")), V10 = structure(list(), class = c("collector_double",
"collector")), Sequence = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Run Code Online (Sandbox Code Playgroud)
我需要复制V1:V10和替换0s 之间1的s 之间的零数。该1小号schould被设置为NA,所以应该0在开始和结束S,因为它们不是在两者之间1秒。
因此,例如,第 1 行应该从 转换1 0 0 0 1 1 0 0 1 1为NA 3 3 3 NA NA 2 2 NA。第 6 行从0 0 1 1 0 0 1 0 1 0到NA NA NA NA 2 2 NA 1 NA NA。
有没有办法在循环中做到这一点?或者可能有一种方法是V1:V10在单个单元格中统一,匹配特定模式,转换它们 - 然后再次拆分单元格?
我不得不承认,这远远超出了我的能力。但我被分配了这项任务,我很感激任何建议!
谢谢!
一个dplyr和tidyr选择可能是:
df %>%
rowid_to_column() %>%
pivot_longer(-rowid) %>%
group_by(rowid) %>%
mutate(value = if_else(value != 0 | cumsum(value) == 0 | rev(cumsum(rev(value))) == 0,
NA_integer_,
with(rle(value), rep(lengths * (values == 0), lengths)))) %>%
pivot_wider(names_from = "name",
values_from = "value")
rowid V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 NA 3 3 3 NA NA 2 2 NA NA
2 2 NA NA NA 4 4 4 4 NA 1 NA
3 3 NA NA 2 2 NA 2 2 NA 1 NA
4 4 NA 6 6 6 6 6 6 NA NA NA
5 5 NA NA NA NA 2 2 NA NA NA NA
6 6 NA NA NA NA 2 2 NA 1 NA NA
Run Code Online (Sandbox Code Playgroud)
定义一个作用于一行的函数 recalc,然后将其应用于每一行。recalc 使用 rleid 识别运行,然后对每次运行执行计算,将 1/0 值作为实部,将运行编号作为复向量的虚部传递给 f。在 f 中,如果运行包含 1(实部)或者它是第一次运行或最后一次运行(虚部),则用 NA 替换,否则用长度替换。最后,重新计算才是真正的部分。
library(data.table)
recalc <- function(x) {
r <- rleid(x)
f <- function(z) if (Re(z)[1] == 1 || Im(z) %in% range(r)) NA else length(z)
Re(ave(x + r * 1i, r, FUN = f))
}
t(apply(DF, 1, recalc))
Run Code Online (Sandbox Code Playgroud)
给出这个矩阵:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1,] NA 3 3 3 NA NA 2 2 NA NA
[2,] NA NA NA 4 4 4 4 NA 1 NA
[3,] NA NA 2 2 NA 2 2 NA 1 NA
[4,] NA 6 6 6 6 6 6 NA NA NA
[5,] NA NA NA NA 2 2 NA NA NA NA
[6,] NA NA NA NA 2 2 NA 1 NA NA
Run Code Online (Sandbox Code Playgroud)