我使用 SPSS 建模器 v18.2.1 和 R v3.5.1(或 v3.3.3)使用 Essentials for R 18.2.1。
我正在尝试制作“扩展转换(R 语法)”节点来处理 SPSS 难以处理的一些问题(未来:使它们成为扩展包)。我希望他们添加多个列,创建新数据等并给出下一个节点data.frame。但是data.frameSPSS 节点错误地识别了它们(即,下一个表节点的输出与 的控制台输出不同print(modelerData))。
怎么做 ?(或者这是一个错误?)
任何帮助将不胜感激。下面是一个可重现的简单示例;
[准备R env和数据(请用纯R做)]
# if not installed
install.packages(randomForest)
set.seed(1) # to reproduce
write.csv(iris[sort(sample(1:150, 100)), ], "iris_train_seed1.csv", row.names = FALSE)
Run Code Online (Sandbox Code Playgroud)
【扩展变换的R代码】
### library ###
library(randomForest)
# make_model
set.seed(1)
modelerModel <- randomForest(formula = Species ~ . ,
data = modelerData,
ntree = 100)
#### predict
pred_forest <- data.frame(pred = predict(modelerModel,
newdata = modelerData))
prob_forest <- as.data.frame(predict(modelerModel,
newdata = modelerData,
type = "prob"))
# overwriting modelerData
modelerData <- cbind(modelerData, pred_forest, prob_forest)
# function definition to make modelerDataModel
getMetaData <- function (data) {
if (dim(data)[1]<=0) {
print("Warning : modelerData has no line, all fieldStorage fields set to strings")
getStorage <- function(x){return("string")}
} else {
getStorage <- function(x) {
res <- NULL
#if x is a factor, typeof will return an integer so we treat the case on the side
if(is.factor(x)) {
res <- "string"
} else {
res <- switch(typeof(unlist(x)),
integer = "integer",
# integer = "real",
double = "real",
character = "string",
"string")
}
return (res)
}
}
col = vector("list", dim(data)[2])
for (i in 1:dim(data)[2]) {
col[[i]] <- c(fieldName=names(data[i]),
fieldLabel="",
fieldStorage=getStorage(data[[i]]),
fieldMeasure="",
fieldFormat="",
fieldRole="")
}
mdm<-do.call(cbind,col)
mdm<-data.frame(mdm)
return(mdm)
}
# overwriting modelerDataModel
modelerDataModel <- getMetaData(modelerData)
# to check
print(dim(modelerData))
print(head(modelerData))
print(dim(modelerDataModel))
print(modelerDataModel)
Run Code Online (Sandbox Code Playgroud)
[“要检查”部分的控制台输出(print(modelerData)是我想要的表节点输出)]
# print(dim(modelerData))
[1] 100 9
# print(head(modelerData))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred setosa
1 4.9 3.0 1.4 0.2 setosa setosa 1
2 4.7 3.2 1.3 0.2 setosa setosa 1
3 5.0 3.6 1.4 0.2 setosa setosa 1
4 5.4 3.9 1.7 0.4 setosa setosa 1
5 4.6 3.4 1.4 0.3 setosa setosa 1
6 5.0 3.4 1.5 0.2 setosa setosa 1
versicolor virginica
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
# print(dim(modelerDataModel))
[1] 6 9
# print(modelerDataModel)
X1 X2 X3 X4 X5 X6
fieldName Sepal.Length Sepal.Width Petal.Length Petal.Width Species pred
fieldLabel
fieldStorage real real real real string string
fieldMeasure
fieldFormat
fieldRole
X7 X8 X9
fieldName setosa versicolor virginica
fieldLabel
fieldStorage real real real
fieldMeasure
fieldFormat
fieldRole
Run Code Online (Sandbox Code Playgroud)
我找到了一种方法来解决我的简单示例......很难理解。从 R 语言的角度来看,这是一个错误。(但是这个方法在其他情况下不起作用,有谁知道如何避免这个错误?)
questions_modelerData <- cbind(modelerData, pred_forest, prob_forest)
modelerData <- cbind(modelerData, pred_forest,
setosa = prob_forest[,1],
versicolor = prob_forest[,2],
virginica = prob_forest[,3])
identical(questions_modelerData, modelerData)
# [1] TRUE
# but this modelerData works unlike the question's.
Run Code Online (Sandbox Code Playgroud)
该死的。