我有一个包含数千行和列的数据框,我需要计算从第一行到每隔一行(row1–row2、row1–row3、row1–row4、...)的字符变量的变化并输出总数更改为新列。
df <- data_frame(
a = c("1 2", "1 2", "2 2", "2 2"),
b = c("2 1", "1 2", "1 2","1 2"),
c = c("1 1", "1 2", "2 1","2 2"),
d = c("1 1", "1 1", "2 1","2 1")
)
df
a b c d
<chr> <chr> <chr> <chr>
1 1 2 2 1 1 1 1 1
2 1 2 1 2 1 2 1 1
3 2 2 1 2 2 1 2 1
4 2 2 1 2 2 2 2 1
Run Code Online (Sandbox Code Playgroud)
我想计算从第 1 行到第 2 行、第 1 行到第 3 行等每个元素之间的字符不匹配。所以我得到这个:
a b c d e
1 1 2 2 1 1 1 1 1 NA #No mismatches to count since this is the first row.
2 1 2 1 2 1 2 1 1 3
3 2 2 1 2 2 1 2 1 5
4 2 2 1 2 2 2 2 1 6
Run Code Online (Sandbox Code Playgroud)
关于如何实现这一目标的任何想法?
一个dplyr和purrr做法可以是:
bind_cols(df, df %>%
mutate_all(~ strsplit(., " ", fixed = TRUE)) %>%
mutate_all(~ map2_int(.x = ., .y = .[1], ~ sum(.x != .y))) %>%
transmute(e = rowSums(select(., everything()))))
a b c d e
<chr> <chr> <chr> <chr> <dbl>
1 1 2 2 1 1 1 1 1 0
2 1 2 1 2 1 2 1 1 3
3 2 2 1 2 2 1 2 1 5
4 2 2 1 2 2 2 2 1 6
Run Code Online (Sandbox Code Playgroud)
或仅使用dplyr:
bind_cols(df, df %>%
mutate_all(~ rowSums(drop(attr(adist(., first(.), count = TRUE), "counts")))) %>%
transmute(e = rowSums(select(., everything()))))
Run Code Online (Sandbox Code Playgroud)