Lis*_*ann 2 select r unique subset
可能重复:
R:在多个列中查找模式 - 可能重复()?
亲爱的大家,
这是我的数据集的一部分:
name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
63 uc003xlv.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
66 uc010lwg.1 chr8 38387812 38445509 - FLG
67 uc010lwh.1 chr8 38387812 38445509 - FLG
68 uc010lwj.1 chr8 38387812 38445509 - FLG
Run Code Online (Sandbox Code Playgroud)
我想基于唯一的start,stop和alias列过滤数据集.最终结果必须是这样的:
name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
66 uc010lwg.1 chr8 38387812 38445509 - FLG
Run Code Online (Sandbox Code Playgroud)
有谁知道这是否有解决方案?谢谢!
使用duplicated功能:
复制数据:
x <- " name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
63 uc003xlv.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
66 uc010lwg.1 chr8 38387812 38445509 - FLG
67 uc010lwh.1 chr8 38387812 38445509 - FLG
68 uc010lwj.1 chr8 38387812 38445509 - FLG"
dat <- read.table(textConnection(x), header=TRUE)
Run Code Online (Sandbox Code Playgroud)
删除重复项:
dat[!duplicated(dat[, c("start", "stop", "alias")]), ]
name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
Run Code Online (Sandbox Code Playgroud)