Lis*_*ann 63 regex string r bioinformatics biomart
我正在使用NCBI参考序列登录号,如变量a:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")
Run Code Online (Sandbox Code Playgroud)
要获得从biomart包我需要删除的信息.1,.2登录号等设备中后.我通常使用以下代码执行此操作:
b <- sub("..*", "", a)
# [1] "" "" "" "" "" ""
Run Code Online (Sandbox Code Playgroud)
但正如您所看到的,这不是这个变量的正确方法.谁能帮我这个?
Han*_*nsi 87
你只需要逃避这个时期:
a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")
gsub("\\..*","",a)
[1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"
Run Code Online (Sandbox Code Playgroud)
zx8*_*754 10
我们可以假装它们是文件名并删除扩展名:
tools::file_path_sans_ext(a)
# [1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"
Run Code Online (Sandbox Code Playgroud)
你可以这样做:
sub("*\\.[0-9]", "", a)
Run Code Online (Sandbox Code Playgroud)
要么
library(stringr)
str_sub(a, start=1, end=-3)
Run Code Online (Sandbox Code Playgroud)
如果字符串应该是固定长度的,那么可以使用substrfrom base R。但是,我们可以得到.with的位置regexpr并在substr
substr(a, 1, regexpr("\\.", a)-1)
#[1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281" "NM_011419" "NM_053155"
Run Code Online (Sandbox Code Playgroud)
我们可以使用前瞻正则表达式来提取之前的字符串.。
library(stringr)
str_extract(a, ".*(?=\\.)")
[1] "NM_020506" "NM_020519" "NM_001030297" "NM_010281"
[5] "NM_011419" "NM_053155"
Run Code Online (Sandbox Code Playgroud)