确定列之间或字符串中的序列长度 - 并粘贴结果

Jak*_*kob 8 r

我正在处理这样的数据:

> df
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  1  0  0  0  1  1  0  0  1   1
2  1  1  1  0  0  0  0  1  0   1
3  1  1  0  0  1  0  0  1  0   1
4  1  0  0  0  0  0  0  1  1   1
5  0  0  0  1  0  0  1  1  1   1
6  0  0  1  1  0  0  1  0  1   0
Run Code Online (Sandbox Code Playgroud)

dput(df) 如下

df <- structure(list(V1 = c(1, 1, 1, 1, 0, 0), V2 = c(0, 1, 1, 0, 0, 
                                                      0), V3 = c(0, 1, 0, 0, 0, 1), V4 = c(0, 0, 0, 0, 1, 1), V5 = c(1, 
                                                                                                                     0, 1, 0, 0, 0), V6 = c(1, 0, 0, 0, 0, 0), V7 = c(0, 0, 0, 0, 
                                                                                                                                                                      1, 1), V8 = c(0, 1, 1, 1, 1, 0), V9 = c(1, 0, 0, 1, 1, 1), V10 = c(1, 
                                                                                                                                                                                                                                         1, 1, 1, 1, 0)), row.names = c(NA, -6L), class = c("tbl_df", 
                                                                                                                                                                                                                                                                                            "tbl", "data.frame"), spec = structure(list(cols = list(V1 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V2 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V3 = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     "collector")), V4 = structure(list(), class = c("collector_double", 
collector")), V5 = structure(list(), class = c("collector_double", 
collector")), V6 = structure(list(), class = c("collector_double", 
collector")), V7 = structure(list(), class = c("collector_double", 
collector")), V8 = structure(list(), class = c("collector_double", 
collector")), V9 = structure(list(), class = c("collector_double", 
collector")), V10 = structure(list(), class = c("collector_double", 
collector")), Sequence = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            "collector"))), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "collector")), skip = 1L), class = "col_spec"))
Run Code Online (Sandbox Code Playgroud)

我需要复制V1:V10和替换0s 之间1的s 之间的零数。该1小号schould被设置为NA,所以应该0在开始和结束S,因为它们不是在两者之间1秒。

因此,例如,第 1 行应该从 转换1 0 0 0 1 1 0 0 1 1NA 3 3 3 NA NA 2 2 NA。第 6 行从0 0 1 1 0 0 1 0 1 0NA NA NA NA 2 2 NA 1 NA NA

有没有办法在循环中做到这一点?或者可能有一种方法是V1:V10在单个单元格中统一,匹配特定模式,转换它们 - 然后再次拆分单元格?

我不得不承认,这远远超出了我的能力。但我被分配了这项任务,我很感激任何建议!

谢谢!

tmf*_*mnk 5

一个dplyrtidyr选择可能是:

df %>%
 rowid_to_column() %>%
 pivot_longer(-rowid) %>%
 group_by(rowid) %>%
 mutate(value = if_else(value != 0 | cumsum(value) == 0 | rev(cumsum(rev(value))) == 0,
                        NA_integer_,
                        with(rle(value), rep(lengths * (values == 0), lengths)))) %>%
 pivot_wider(names_from = "name",
             values_from = "value")

  rowid    V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1    NA     3     3     3    NA    NA     2     2    NA    NA
2     2    NA    NA    NA     4     4     4     4    NA     1    NA
3     3    NA    NA     2     2    NA     2     2    NA     1    NA
4     4    NA     6     6     6     6     6     6    NA    NA    NA
5     5    NA    NA    NA    NA     2     2    NA    NA    NA    NA
6     6    NA    NA    NA    NA     2     2    NA     1    NA    NA
Run Code Online (Sandbox Code Playgroud)


G. *_*eck 5

定义一个作用于一行的函数 recalc,然后将其应用于每一行。recalc 使用 rleid 识别运行,然后对每次运行执行计算,将 1/0 值作为实部,将运行编号作为复向量的虚部传递给 f。在 f 中,如果运行包含 1(实部)或者它是第一次运行或最后一次运行(虚部),则用 NA 替换,否则用长度替换。最后,重新计算才是真正的部分。

library(data.table)

recalc <- function(x) {
  r <- rleid(x)
  f <- function(z) if (Re(z)[1] == 1 || Im(z) %in% range(r)) NA else length(z)
  Re(ave(x + r * 1i, r, FUN = f))
}
t(apply(DF, 1, recalc))
Run Code Online (Sandbox Code Playgroud)

给出这个矩阵:

     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
[1,] NA  3  3  3 NA NA  2  2 NA  NA
[2,] NA NA NA  4  4  4  4 NA  1  NA
[3,] NA NA  2  2 NA  2  2 NA  1  NA
[4,] NA  6  6  6  6  6  6 NA NA  NA
[5,] NA NA NA NA  2  2 NA NA NA  NA
[6,] NA NA NA NA  2  2 NA  1 NA  NA
Run Code Online (Sandbox Code Playgroud)

  • 以数学方式思考,+1! (2认同)