提取 R 中的行序列

Chr*_*ann 1 r dplyr

我有这种类型的数据:

\n
df <- structure(list(Utterance = c("(5.127)", ">like I don't understand< sorry like how old's your mom\xc2\xbf", \n                                   "(0.855)", "eh six:ty:::-one=", "(0.101)", "(0.487)", "[((v: gasps)) she said] ~no you're [not?]~", \n                                   "[((v: gasps)) she said] ~no you're [not?]~", "~<[NO YOU'RE] NOT (.) you can't go !in!>~", \n                                   "(0.260)", "show her [your boobs] next time"), \n                     Q = c(NA, "q_wh", "", "", NA, NA, "q_really", "", "", NA, NA), \n                     Sequ = c(NA, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, NA, NA)), class = "data.frame", row.names = c(NA, -11L))\n
Run Code Online (Sandbox Code Playgroud)\n

我想提取/过滤

\n
    \n
  • Sequ那些不是NA
  • \n
  • 紧邻的前一行(其中SequNA
  • \n
\n

到目前为止,我的尝试是定义一个获取相关行索引的函数:

\n
QA_sequ <- function(value) {\n  inds <- which(!is.na(value) & lag(is.na(value)))  \n  sort(unique(c(inds-1, inds)))\n}\n
Run Code Online (Sandbox Code Playgroud)\n

然后通过索引切出行:

\n
library(dplyr)\ndf %>% \n  slice(QA_sequ(Sequ))\n                                                 Utterance        Q Sequ\n1                                                  (5.127)     <NA>   NA\n2 >like I don't understand< sorry like how old's your mom\xc2\xbf     q_wh    1\n3                                                  (0.487)     <NA>   NA\n4               [((v: gasps)) she said] ~no you're [not?]~ q_really    0\n
Run Code Online (Sandbox Code Playgroud)\n

Sequ但是,仅过滤紧邻的前一行和第一行。我想要获得的结果是这样的

\n
                                                  Utterance        Q Sequ\n1                                                   (5.127)     <NA>   NA\n2  >like I don't understand< sorry like how old's your mom\xc2\xbf     q_wh    1\n3                                                   (0.855)             1\n4                                         eh six:ty:::-one=             1\n5                                                   (0.487)     <NA>   NA\n6                [((v: gasps)) she said] ~no you're [not?]~ q_really    0\n7                [((v: gasps)) she said] ~no you're [not?]~             0\n8                 ~<[NO YOU'RE] NOT (.) you can't go !in!>~             0\n
Run Code Online (Sandbox Code Playgroud)\n

编辑

\n

我想出的解决方案感觉很麻烦:

\n
QA_sequ <- function(value) {\n  inds <- which(!is.na(value) & lag(is.na(value)))  \n  sort(unique(c(inds-1)))    # extract only preceding row!\n}\n\nlibrary(dplyr)\ndf %>% \n  mutate(id = row_number()) %>%\n  slice(QA_sequ(Sequ)) %>%\n  bind_rows(., df %>% mutate(id = row_number()) %>% filter(!is.na(Sequ))) %>%\n  arrange(id)\n
Run Code Online (Sandbox Code Playgroud)\n

r2e*_*ans 5

这个怎么样?

\n
df %>%\n  filter(!is.na(Sequ) | lead(!is.na(Sequ), default=FALSE))\n#                                                  Utterance        Q Sequ\n# 1                                                  (5.127)     <NA>   NA\n# 2 >like I don\'t understand< sorry like how old\'s your mom\xc2\xbf     q_wh    1\n# 3                                                  (0.855)             1\n# 4                                        eh six:ty:::-one=             1\n# 5                                                  (0.487)     <NA>   NA\n# 6               [((v: gasps)) she said] ~no you\'re [not?]~ q_really    0\n# 7               [((v: gasps)) she said] ~no you\'re [not?]~             0\n# 8                ~<[NO YOU\'RE] NOT (.) you can\'t go !in!>~             0\n
Run Code Online (Sandbox Code Playgroud)\n

逻辑过滤(提取)以下两者:

\n
    \n
  • 所有非NA
  • \n
  • NA下一个值不是的任何值NA
  • \n
\n