如何在R中填充多个相同类型的对象?

Wil*_*son 6 r class object slot

假设我要定义两个类类,Sentence并且Word.每个单词对象都有一个字符串和一个词性(pos).每个句子包含一些单词,并有一个额外的数据槽.

Word班是简单的定义.

wordSlots <- list(word = "character", pos = "character")
wordProto <- list(word = "", pos = "")
setClass("Word", slots = wordSlots, prototype = wordProto)    
Word <- function(word, pos) new("Word", word=word, pos=pos)
Run Code Online (Sandbox Code Playgroud)

现在我想创建一个Sentence可以包含一些Words和一些数值数据的类.

如果我将Sentence类定义为:

sentenceSlots <- list(words = "Word", stats = "numeric")
sentenceProto <- list(words = Word(), stats = 0)
setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto)
Run Code Online (Sandbox Code Playgroud)

那句话只能包含一个单词.我显然可以用许多插槽来定义它,每个字一个,但是它的长度会受到限制.

但是,如果我Sentence像这样定义类:

sentenceSlots <- list(words = "list", stats = "numeric")
sentenceProto <- list(words = list(Word()), stats = 0)
setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto)
Run Code Online (Sandbox Code Playgroud)

它可以包含任意数量的单词,但插槽words可以包含不属于类的对象Word.

有没有办法实现这个目标?这类似于C++,你可以拥有相同类型的对象向量.

Mar*_*gan 7

记住R在向量上运行良好,第一步是想到'单词'而不是'单词'

## constructor, accessors, subset (also need [[, [<-, [[<- methods)
.Words <- setClass("Words",
    representation(words="character", parts="character"))
words <- function(x) x@words
parts <- function(x) x@parts
setMethod("length", "Words", function(x) length(words(x)))
setMethod("[", c("Words", "ANY", "missing"), function(x, i, j, ...) {
    initialize(x, words=words(x)[i], parts=parts(x)[i], ...)
})

## validity
setValidity("Words", function(object) {
    if (length(words(object)) == length(parts(object)))
        NULL
    else
        "'words()' and 'parts()' are not the same length"
})
Run Code Online (Sandbox Code Playgroud)

@ nicola的建议是,一个单词列表已在IRanges包中正式化(实际上,生物传导器的'devel'/ 3.0分支中的S4Vectors),其中'SimpleList'采用'天真'方法,要求所有元素list具有相同的类,而'CompressedList'具有相似的行为,但实际上实现为类似矢量的对象(一个具有length(),[和[[方法]''已分区'(通过结束或宽度)成组.

library(IRanges)
.Sentences = setClass("Sentences",
    contains="CompressedList",    
    prototype=c(elementType="Words"))
Run Code Online (Sandbox Code Playgroud)

然后,人们会编写一个更加用户友好的构造函数,但基本功能是

## 0 Sentences
.Sentences()
## 1 sentence of 0 words
.Sentences(unlistData=.Words(), partitioning=PartitioningByEnd(0))
## 3 sentences of 2, 0, and 3 words
s3 <- .Sentences(unlistData=.Words(words=letters[1:5], parts=LETTERS[1:5]), 
    partitioning=PartitioningByEnd(c(2, 2, 5)))
Run Code Online (Sandbox Code Playgroud)

导致

> s3[[1]]
An object of class "Words"
Slot "word":
[1] "a" "b"

Slot "part":
[1] "A" "B"

> s3[[2]]
An object of class "Words"
Slot "word":
character(0)

Slot "part":
character(0)

> s3[[3]]
An object of class "Words"
Slot "word":
[1] "c" "d" "e"

Slot "part":
[1] "C" "D" "E"
Run Code Online (Sandbox Code Playgroud)

请注意,一些典型的操作很快,因为它们可以在"未列出"元素上操作而不创建或破坏S4实例,例如,将所有"单词"强制转换为大写

setMethod(toupper, "Words", function(x) { x@word <- toupper(x@word); x })
setMethod(toupper, "Sentences", function(x) relist(toupper(unlist(x)), x))
Run Code Online (Sandbox Code Playgroud)

这对于大量句子来说是"快速的",因为unlist/relist实际上是在插槽访问和创建单个"Words"实例.R和Bioconductor的可扩展基因组学概述了这一策略和其他策略.

在回答@nicola说'R不完全适合OO编程风格'但是它可能更有助于意识到R的S4面向对象风格不同于C++和Java,就像R与C不同.特别是继续它是非常有价值的在使用S4时用向量思考 - 单词而不是单词,人而不是人......