我有一个由DNA序列组成的属性,并希望将其翻译为氨基名称.所以我需要将序列拆分为3的固定长度字符.这是数据的样本
data=c("AATAGACGT","TGACCC","AAATCACTCTTT")
Run Code Online (Sandbox Code Playgroud)
如何将其提取到:
[1] "AAT" "AGA" "CGT"
[2] "TGA" "CCC"
[3] "AAA" "TCA" "CTC" "TTT"
Run Code Online (Sandbox Code Playgroud)
到目前为止,我只能找到如何在给定特定正则表达式作为分隔符的情况下拆分字符串
尝试
strsplit(data, '(?<=.{3})', perl=TRUE)
Run Code Online (Sandbox Code Playgroud)
要么
library(stringi)
stri_extract_all_regex(data, '.{1,3}')
Run Code Online (Sandbox Code Playgroud)