我有一个数据框,我想在这个数据帧中使用某些值作为哈希键/字典键(或者你用你选择的语言称之为的任何值).假设我有一个这样的数据框,我从一个大的csv文件中读取(仅显示第一行):
Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
1 Plate 1_A1 QN2200 A 1.766 2.791 Both
Run Code Online (Sandbox Code Playgroud)
在R代码中将是:
structure(list(Plate.name = structure(1L, .Label = "Plate 1_A1", class = "factor"),
QN.number = structure(1L, .Label = "QN2200", class = "factor"),
Well = structure(1L, .Label = "A1", class = "factor"), Allele.X.Rn = 1.766,
Allele.Y.Rn = 2.791, Call = structure(1L, .Label = "Both", class = "factor")), .Names = c("Plate.name",
"QN.number", "Well", "Allele.X.Rn", "Allele.Y.Rn", "Call"), class = "data.frame", row.names = c(NA,
-1L))
Run Code Online (Sandbox Code Playgroud)
QN.numbers是我的数据集中的唯一ID.然后我如何使用QN.number作为其他值的参考来检索数据,也就是说我想知道给定QN.number的Call或Allele.X.Rn?似乎row.names可能会做的伎俩,但那么我将如何在这个实例中使用它们?
使用row.names是这样的:
> row.names(d)=d$QN.number
> d["QN2200",]
Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
QN2200 Plate 1_A1 QN2200 A1 1.766 2.791 Both
> d["QN2201",]
Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
NA <NA> <NA> <NA> NA NA <NA>
Run Code Online (Sandbox Code Playgroud)
您只需使用行名作为子集中的第一个参数.您还可以使用多个行名称:
> d=data.frame(a=letters[1:10],b=runif(10))
> row.names(d)=d$a
> d[c("a","g","d"),]
a b
a a 0.6434431
g g 0.6724661
d d 0.9826392
Run Code Online (Sandbox Code Playgroud)
现在我不确定这是多么聪明,以及它是否对每个行名称进行顺序搜索或更快的索引...