Wil*_*son 6 r class object slot
假设我要定义两个类类,Sentence并且Word.每个单词对象都有一个字符串和一个词性(pos).每个句子包含一些单词,并有一个额外的数据槽.
该Word班是简单的定义.
wordSlots <- list(word = "character", pos = "character")
wordProto <- list(word = "", pos = "")
setClass("Word", slots = wordSlots, prototype = wordProto)
Word <- function(word, pos) new("Word", word=word, pos=pos)
Run Code Online (Sandbox Code Playgroud)
现在我想创建一个Sentence可以包含一些Words和一些数值数据的类.
如果我将Sentence类定义为:
sentenceSlots <- list(words = "Word", stats = "numeric")
sentenceProto <- list(words = Word(), stats = 0)
setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto)
Run Code Online (Sandbox Code Playgroud)
那句话只能包含一个单词.我显然可以用许多插槽来定义它,每个字一个,但是它的长度会受到限制.
但是,如果我Sentence像这样定义类:
sentenceSlots <- list(words = "list", stats = "numeric")
sentenceProto <- list(words = list(Word()), stats = 0)
setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto)
Run Code Online (Sandbox Code Playgroud)
它可以包含任意数量的单词,但插槽words可以包含不属于类的对象Word.
有没有办法实现这个目标?这类似于C++,你可以拥有相同类型的对象向量.
记住R在向量上运行良好,第一步是想到'单词'而不是'单词'
## constructor, accessors, subset (also need [[, [<-, [[<- methods)
.Words <- setClass("Words",
representation(words="character", parts="character"))
words <- function(x) x@words
parts <- function(x) x@parts
setMethod("length", "Words", function(x) length(words(x)))
setMethod("[", c("Words", "ANY", "missing"), function(x, i, j, ...) {
initialize(x, words=words(x)[i], parts=parts(x)[i], ...)
})
## validity
setValidity("Words", function(object) {
if (length(words(object)) == length(parts(object)))
NULL
else
"'words()' and 'parts()' are not the same length"
})
Run Code Online (Sandbox Code Playgroud)
@ nicola的建议是,一个单词列表已在IRanges包中正式化(实际上,生物传导器的'devel'/ 3.0分支中的S4Vectors),其中'SimpleList'采用'天真'方法,要求所有元素list具有相同的类,而'CompressedList'具有相似的行为,但实际上实现为类似矢量的对象(一个具有length(),[和[[方法]''已分区'(通过结束或宽度)成组.
library(IRanges)
.Sentences = setClass("Sentences",
contains="CompressedList",
prototype=c(elementType="Words"))
Run Code Online (Sandbox Code Playgroud)
然后,人们会编写一个更加用户友好的构造函数,但基本功能是
## 0 Sentences
.Sentences()
## 1 sentence of 0 words
.Sentences(unlistData=.Words(), partitioning=PartitioningByEnd(0))
## 3 sentences of 2, 0, and 3 words
s3 <- .Sentences(unlistData=.Words(words=letters[1:5], parts=LETTERS[1:5]),
partitioning=PartitioningByEnd(c(2, 2, 5)))
Run Code Online (Sandbox Code Playgroud)
导致
> s3[[1]]
An object of class "Words"
Slot "word":
[1] "a" "b"
Slot "part":
[1] "A" "B"
> s3[[2]]
An object of class "Words"
Slot "word":
character(0)
Slot "part":
character(0)
> s3[[3]]
An object of class "Words"
Slot "word":
[1] "c" "d" "e"
Slot "part":
[1] "C" "D" "E"
Run Code Online (Sandbox Code Playgroud)
请注意,一些典型的操作很快,因为它们可以在"未列出"元素上操作而不创建或破坏S4实例,例如,将所有"单词"强制转换为大写
setMethod(toupper, "Words", function(x) { x@word <- toupper(x@word); x })
setMethod(toupper, "Sentences", function(x) relist(toupper(unlist(x)), x))
Run Code Online (Sandbox Code Playgroud)
这对于大量句子来说是"快速的",因为unlist/relist实际上是在插槽访问和创建单个"Words"实例.R和Bioconductor的可扩展基因组学概述了这一策略和其他策略.
在回答@nicola说'R不完全适合OO编程风格'但是它可能更有助于意识到R的S4面向对象风格不同于C++和Java,就像R与C不同.特别是继续它是非常有价值的在使用S4时用向量思考 - 单词而不是单词,人而不是人......