小编Rad*_*bys的帖子

基于重要性的变量缩减

我在过滤模型中最不重要的变量时遇到了困难。我收到了一组包含 4,000 多个变量的数据，我被要求减少进入模型的变量数量。

我确实尝试过两种方法，但我失败了两次。

我尝试的第一件事是在建模后手动检查变量重要性，并在此基础上删除不重要的变量。

# reproducible example
data <- iris

# artificial class imbalancing
data <- iris %>% 
  mutate(Species = as.factor(ifelse(Species == "virginica", "1", "0")))

Run Code Online (Sandbox Code Playgroud)

使用 simple 时一切正常Learner：

# creating Task
task <- TaskClassif$new(id = "score", backend = data, target = "Species", positive = "1")

# creating Learner
lrn <- lrn("classif.xgboost") 

# setting scoring as prediction type 
lrn$predict_type = "prob"

lrn$train(task)
lrn$importance()

 Petal.Width Petal.Length 
  0.90606304   0.09393696

Run Code Online (Sandbox Code Playgroud)

问题是数据高度不平衡，所以我决定使用GraphLearnerwithPipeOp运算符来对多数组进行欠采样，然后将其传递给AutoTuner：

我确实跳过了我认为对这种情况不重要的代码的某些部分，例如搜索空间、终止符、调谐器等。