如何在R中的列匹配模式中转换值

Question

如何在R中的列匹配模式中转换值

我有这个数据帧mydf.该列nucleotide可以有'A','T','G','C'字母.如果A列是' - ' ,我想将字母A更改为T,C更改为G,G更改为C,将T更改为A. 我该怎么做？

  mydf<- structure(list(seqnames = structure(c(1L, 1L, 1L, 1L), .Label = c("chr1", 
    "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", 
    "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", 
    "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX", 
    "chrY", "chrM"), class = "factor"), pos = c(115258748, 115258748, 
    115258748, 115258748), strand = structure(c(1L, 2L, 1L, 2L), .Label = c("+", 
    "-", "*"), class = "factor"), nucleotide = structure(c(2L, 2L, 
    2L, 2L), .Label = c("A", "C", "G", "T", "N", "=", "-"), class = "factor")), .Names = c("seqnames", 
    "pos", "strand", "nucleotide"), row.names = c(NA, 4L), class = "data.frame")

Run Code Online (Sandbox Code Playgroud)

结果

 seqnames       pos strand nucleotide
1     chr1 115258748      +          C
2     chr1 115258748      -          G
3     chr1 115258748      +          C
4     chr1 115258748      -          G

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ric*_*ven 16

对于一对一的字符翻译,您可以使用chartr().

within(mydf, {
  nucleotide[strand == "-"] <- chartr("ACGT", "TGCA", nucleotide[strand == "-"])
})
#   seqnames       pos strand nucleotide
# 1     chr1 115258748      +          C
# 2     chr1 115258748      -          G
# 3     chr1 115258748      +          C
# 4     chr1 115258748      -          G

Run Code Online (Sandbox Code Playgroud)

请注意,我within()在这里使用以避免写入mydf$四次并保存以免更改原始数据.您也可以编写以下内容,但请记住,您将更改原始数据.

mydf$nucleotide[mydf$strand == "-"] <- 
    with(mydf, chartr("ACGT", "TGCA", nucleotide[strand == "-"]))

Run Code Online (Sandbox Code Playgroud)

`chartr`是这种情况的理想选择,但创建旧值和新值的查找表可能是最有效的方法 - http://stackoverflow.com/questions/18456968/how-do-i-map-a-vector -of-values-to-another-with-my-own-custom-map-in-r/18457055#18457055此外,`with(mydf,ifelse(strand ==" - ",chartr("ACGT", "TGCA",核苷酸),核苷酸))`将避免覆盖变量. (3认同)

归档时间：	10 年，7 月前
查看次数：	168 次
最近记录：	6 年，9 月前