我一直在尝试使用 LASSO 进行无监督特征选择(通过删除类列)。数据集包括分类(因子)和连续(数字)变量。链接在这里。我构建了一个设计矩阵,使用model.matrix()它为每个级别的分类变量创建虚拟变量。
dataset <- read.xlsx("./hepatitis.data.xlsx", sheet = "hepatitis", na.strings = "")
names_df <- names(dataset)
formula_LASSO <- as.formula(paste("~ 0 +", paste(names_df, collapse = " + ")))
LASSO_df <- model.matrix(object = formula_LASSO, data = dataset, contrasts.arg = lapply(dataset[ ,sapply(dataset, is.factor)], contrasts, contrasts = FALSE ))
### Group LASSO using gglasso package
gglasso_group <- c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, …Run Code Online (Sandbox Code Playgroud)