删除"."之后的部分字符串.

Question

删除"."之后的部分字符串.

Lis*_*ann 63 regex string r bioinformatics biomart

我正在使用NCBI参考序列登录号,如变量a:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")

Run Code Online (Sandbox Code Playgroud)

要获得从biomart包我需要删除的信息.1,.2登录号等设备中后.我通常使用以下代码执行此操作:

b <- sub("..*", "", a)

# [1] "" "" "" "" "" ""

Run Code Online (Sandbox Code Playgroud)

但正如您所看到的,这不是这个变量的正确方法.谁能帮我这个？

Answer 1

Han*_*nsi 87

你只需要逃避这个时期:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")

gsub("\\..*","",a)
[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"

Run Code Online (Sandbox Code Playgroud)

Answer 2

zx8*_*754 10

我们可以假装它们是文件名并删除扩展名:

tools::file_path_sans_ext(a)
# [1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"

Run Code Online (Sandbox Code Playgroud)

Answer 3

joh*_*nes 6

你可以这样做:

sub("*\\.[0-9]", "", a)

Run Code Online (Sandbox Code Playgroud)

要么

library(stringr)
str_sub(a, start=1, end=-3)

Run Code Online (Sandbox Code Playgroud)

`str_sub(a, start = 1, end = -3)` 解决方案假设**只有两个字符**需要删除（“.”和其后的一个数字）。对于许多基因 ID 系统，版本中可能有多个数字（尤其是探针 ID）。在这种情况下，更灵活的解决方案是“str_remove(a,pattern = "\\..*")”。在上面的代码中，模式是查找第一个句点（使用 `"\\."`），然后查找其后的*任意*字符（`"."`）*任意*次数（`"*"`））。 (6认同)
替代：`str_replace（a，“ \\。[0-9]”，“”）`和`str_replace（a，“ \\ .. *”，“”）` (3认同)

Answer 4

akr*_*run 6

如果字符串应该是固定长度的，那么可以使用substrfrom base R。但是，我们可以得到.with的位置regexpr并在substr

substr(a, 1, regexpr("\\.", a)-1)
#[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"

Run Code Online (Sandbox Code Playgroud)

Answer 5

ben*_*n23 5

我们可以使用前瞻正则表达式来提取之前的字符串.。

library(stringr)

str_extract(a, ".*(?=\\.)")
[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"   
[5] "NM_011419"    "NM_053155"

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，9 月前
查看次数：	85894 次
最近记录：	6 年，10 月前