这是一个数据示例:
exp_data <- structure(list(Seq = c("AAAARVDS", "AAAARVDSSSAL",
"AAAARVDSRASDQ"), Change = structure(c(19L, 20L, 13L), .Label = c("",
"C[+58]", "C[+58], F[+1152]", "C[+58], F[+1152], L[+12], M[+12]",
"C[+58], L[+2909]", "L[+12]", "L[+370]", "L[+504]", "M[+12]",
"M[+1283]", "M[+1457]", "M[+1491]", "M[+16]", "M[+16], Y[+1013]",
"M[+16], Y[+1152]", "M[+16], Y[+762]", "M[+371]", "M[+386], Y[+12]",
"M[+486], W[+12]", "Y[+12]", "Y[+1240]", "Y[+1502]", "Y[+1988]",
"Y[+2918]"), class = "factor"), `Mass` = c(1869.943,
1048.459, 707.346), Size = structure(c(2L, 2L, 2L), .Label = c("Matt",
"Greg",
"Kieran"
), class = "factor"), `Number` = c(2L, 2L, 2L)), row.names = c(244L,
392L, 396L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
我想提请您注意栏目名称Change因为这是我想用于过滤的列名称。我们这里有三行,我只想保留第一行,因为特定字母的变化大于 100。我想保留所有包含大于 +100 的字母变化的行。更改列中可能有最多 4-5 个字母,但如果至少有一个修改至少 +100,我想保留这一行。
你有什么简单的解决方案吗?
预期输出:
Seq Change Mass Size Number
244 AAAARVDS M[+486], W[+12] 1869.943 Greg 2
Run Code Online (Sandbox Code Playgroud)
不完全确定我正确理解了你的问题陈述,但也许是这样的
library(dplyr)
library(stringr)
exp_data %>% filter(str_detect(Change, "\\d{3}"))
# Seq Change Mass Size Number
#1 AAAARVDS M[+486], W[+12] 1869.943 Greg 2
Run Code Online (Sandbox Code Playgroud)
或者在基本 R 中相同
exp_data[grep("\\d{3}", exp_data$Change), ]
# Seq Change Mass Size Number
#1 AAAARVDS M[+486], W[+12] 1869.943 Greg 2
Run Code Online (Sandbox Code Playgroud)
这个想法是使用正则表达式来仅保留那些Change至少包含一个三位数表达式的行。