标签: random-forest

在R中组合使用不同训练集构建的随机森林

我是R的新手(第2天),他的任务是建造一片随意的森林.每个随机森林将使用不同的训练集建立,我们将结束所有森林进行预测.我在R中实现这个,并且在使用不使用相同集合构建的两个森林时遇到一些困难.我的尝试如下:

d1 = read.csv("../data/rr/train/10/chunk0.csv",header=TRUE)
d2 = read.csv("../data/rr/train/10/chunk1.csv",header=TRUE)

rf1 = randomForest(A55~., data=d1, ntree=10)
rf2 = randomForest(A55~., data=d2, ntree=10)

rf = combine(rf1,rf2)

Run Code Online (Sandbox Code Playgroud)

这当然会产生错误:

Error in rf$votes + ifelse(is.na(rflist[[i]]$votes), 0, rflist[[i]]$votes) : 
non-conformable arrays
In addition: Warning message:
In rf$oob.times + rflist[[i]]$oob.times :
longer object length is not a multiple of shorter object length

Run Code Online (Sandbox Code Playgroud)

我已经浏览网页一段时间了解这一点,但尚未取得任何成功.这里的任何帮助将非常感激.

statistics r machine-learning random-forest

jos*_*osh

lucky-day

11
推荐指数

1
解决办法

8251
查看次数

如何从python输出RandomForest分类器？

我已经使用非常大的数据集训练了一个来自Python Sckit Learn Module的RandomForestClassifier,但问题是我怎么可能保存这个模型并让其他人在它们的末尾应用它.谢谢!

python random-forest scikit-learn

use*_*725

lucky-day

11
推荐指数

2
解决办法

8870
查看次数

R:从h2o.randomForest()和h2o.gbm()绘制树

寻找一种有效的方法在rstudio,H2O的Flow或h2o的RF和GBM模型的本地html页面中绘制树,类似于下面链接中的图像.具体来说,如何通过解析h2o.download_pojo(rf1)或h2o.download_pojo(gbm1)来为下面的代码生成的对象(拟合模型)rf1和gbm2绘制树？

# # The following two commands remove any previously installed H2O packages for R.
# if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
# if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }

# # Next, we download packages that H2O depends on.
# pkgs <- c("methods","statmod","stats","graphics","RCurl","jsonlite","tools","utils")
# for (pkg in pkgs) {
#   if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
# }
# 
# # Now we download, install h2o package
# install.packages("h2o", type="source", repos=(c("http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/R")))
library(h2o)

h2o.init(nthreads = …

Run Code Online (Sandbox Code Playgroud)

r data-visualization random-forest gbm h2o

Web*_*bby

2016 05-25

11
推荐指数

1
解决办法

6435
查看次数

xgboost中的多输出回归

是否有可能在Xgboost中训练具有多个连续输出的模型(多元回归)？培养这种模型的目标是什么？

在此先感谢您的任何建议

machine-learning random-forest xgboost

use*_*011

lucky-day

11
推荐指数

3
解决办法

7678
查看次数

使用randomforest()进行R分类？

我最初有一个由N行12列组成的数据框.最后一列是我的班级(0或1).我不得不将我的整个数据框转换为数字

training <- sapply(training.temp,as.numeric)

Run Code Online (Sandbox Code Playgroud)

但后来我觉得我需要将class列作为因子列来使用randomforest()工具作为分类器,所以我做了

training[,"Class"] <- factor(training[,ncol(training)])

Run Code Online (Sandbox Code Playgroud)

我继续创建树

training_rf <- randomForest(Class ~., data = trainData, importance = TRUE, do.trace = 100)

Run Code Online (Sandbox Code Playgroud)

但我得到两个错误:

1: In Ops.factor(training[, "Status"], factor(training[, ncol(training)])) : 
<= this is not relevant for factors (roughly translated)
2: In randomForest.default(m, y, ...) :
The response has five or fewer unique values.  Are you sure you want to do regression?

Run Code Online (Sandbox Code Playgroud)

如果有人能指出我正在制作的格式错误,我将不胜感激.

谢谢!

r random-forest

mar*_*arc

2013 10-11

10
推荐指数

2
解决办法

1万
查看次数

在使用公式使用插入符号train()训练的randomForest对象上使用predict()时出错

在64位Linux机器上使用R 3.2.0 with caret 6.0-41和randomForest 4.6-10.

当尝试使用公式对使用包中的函数训练predict()的randomForest对象使用该方法时,该函数返回错误.当通过训练和/或使用和而不是公式,这一切都顺利进行.train()caretrandomForest()x=y=

这是一个工作示例:

library(randomForest)
library(caret)

data(imports85)
imp85     <- imports85[, c("stroke", "price", "fuelType", "numOfDoors")]
imp85     <- imp85[complete.cases(imp85), ]
imp85[]   <- lapply(imp85, function(x) if (is.factor(x)) x[,drop=TRUE] else x) ## Drop empty levels for factors.

modRf1  <- randomForest(numOfDoors~., data=imp85)
caretRf <- train( numOfDoors~., data=imp85, method = "rf" )
modRf2  <- caretRf$finalModel
modRf3  <- randomForest(x=imp85[,c("stroke", "price", "fuelType")], y=imp85[, "numOfDoors"])
caretRf <- train(x=imp85[,c("stroke", "price", "fuelType")], y=imp85[, "numOfDoors"], method = "rf")
modRf4 …

Run Code Online (Sandbox Code Playgroud)

r formula predict random-forest r-caret

Adr*_*baz

2015 11-08

10
推荐指数

1
解决办法

1万
查看次数

以下xgboost模型树图中'leaf'的值是什么意思？

考虑到上述(树枝)条件存在,我猜测它是条件概率.但是,我不清楚.

如果您想了解有关所用数据的更多信息或我们如何获得此图表,请访问:http://machinelearningmastery.com/visualize-gradient-boosting-decision-trees-xgboost-python/

python machine-learning decision-tree random-forest xgboost

dsl*_*990

lucky-day

10
推荐指数

3
解决办法

5452
查看次数

预测班级或班级概率？

我目前正在使用H2O作为分类问题数据集.我H2ORandomForestEstimator在python 3.6环境中测试它.我注意到预测方法的结果给出了0到1之间的值(我假设这是概率).

在我的数据集中,目标属性是数字,即True值为1,False值为0.我确保将类型转换为目标属性的类别,我仍然得到相同的结果.

然后我修改了代码,将目标列转换为asfactor()H2OFrame上的因子使用方法,结果没有任何变化.

但是当我将目标属性中的值分别更改为1和0时的True和False时,我得到了预期结果(即)输出是分类而不是概率.

获得分类预测结果的正确方法是什么？
如果概率是数值目标值的结果,那么在多类分类的情况下如何处理它？

python classification machine-learning random-forest h2o

Rah*_*hul

2018 07-28

10
推荐指数

2
解决办法

4171
查看次数

在h2o随机森林中用于"重要性"的度量是多少？

这是我的代码:

set.seed(1)

#Boruta on the HouseVotes84 data from mlbench
library(mlbench) #has HouseVotes84 data
library(h2o)     #has rf

#spin up h2o
myh20 <- h2o.init(nthreads = -1)

#read in data, throw some away
data(HouseVotes84)
hvo <- na.omit(HouseVotes84)

#move from R to h2o
mydata <- as.h2o(x=hvo,
                 destination_frame= "mydata")

#RF columns (input vs. output)
idxy <- 1
idxx <- 2:ncol(hvo)

#split data
splits <- h2o.splitFrame(mydata,           
                         c(0.8,0.1))     

train <- h2o.assign(splits[[1]], key="train")   
valid <- h2o.assign(splits[[2]], key="valid") 

# make random forest
my_imp.rf<- h2o.randomForest(y=idxy,x=idxx,
                      training_frame = train,
                      validation_frame = …

Run Code Online (Sandbox Code Playgroud)

random-forest h2o gini

Eng*_*ica

lucky-day

9
推荐指数

1
解决办法

5673
查看次数

为什么具有单个树的Random Forest比决策树分类器好得多？

我通过scikit-learn图书馆学习机器学习.我使用以下代码将决策树分类器和随机森林分类器应用于我的数据:

def decision_tree(train_X, train_Y, test_X, test_Y):

    clf = tree.DecisionTreeClassifier()
    clf.fit(train_X, train_Y)

    return clf.score(test_X, test_Y)


def random_forest(train_X, train_Y, test_X, test_Y):
    clf = RandomForestClassifier(n_estimators=1)
    clf = clf.fit(X, Y)

    return clf.score(test_X, test_Y)

Run Code Online (Sandbox Code Playgroud)

为什么随机森林分类器的结果更好(100次运行,随机抽样2/3的数据用于训练,1/3用于测试)？

100%|???????????????????????????????????????| 100/100 [00:01<00:00, 73.59it/s]
Algorithm: Decision Tree
  Min     : 0.3883495145631068
  Max     : 0.6476190476190476
  Mean    : 0.4861783113770316
  Median  : 0.48868030937802126
  Stdev   : 0.047158171852401135
  Variance: 0.0022238931724605985
100%|???????????????????????????????????????| 100/100 [00:01<00:00, 85.38it/s]
Algorithm: Random Forest
  Min     : 0.6846846846846847
  Max     : 0.8653846153846154
  Mean    : 0.7894823428836184
  Median  : 0.7906101571063208
  Stdev   : 0.03231671150915106
  Variance: 0.0010443698427656967 …

Run Code Online (Sandbox Code Playgroud)

python machine-learning decision-tree random-forest scikit-learn

hal*_*_me

2019 10-12

9
推荐指数

1
解决办法

1441
查看次数