我正在使用自然语言处理进行短信挖掘.我用quantedapackage来生成文档特征矩阵(dfm).现在我想使用卡方检验进行特征选择.我知道已经有很多人问过这个问题了.但是,我找不到相关的代码.(答案只是提供了一个简短的概念,如下所示:https://stats.stackexchange.com/questions/93101/how-can-i-perform-a-chi-square-test-to-do-feature-selection- in-r)
我了解到我可以chi.squared在FSelector包中使用,但我不知道如何将此函数应用于dfm类对象(trainingtfidf如下).(在手册中显示,它适用于预测变量)
谁能给我一个提示?我很感激!
示例代码:
description <- c("From month 2 the AST and total bilirubine were not measured.", "16:OTHER - COMMENT REQUIRED IN COMMENT COLUMN;07/02/2004/GENOTYPING;SF- genotyping consent not offered until T4.", "M6 is 13 days out of the visit window")
code <- c(4,3,6)
example <- data.frame(description, code)
library(quanteda)
trainingcorpus <- corpus(example$description)
trainingdfm <- dfm(trainingcorpus, verbose = TRUE, stem=TRUE, toLower=TRUE, removePunct= TRUE, removeSeparators=TRUE, language="english", ignoredFeatures = stopwords("english"), removeNumbers=TRUE, ngrams …Run Code Online (Sandbox Code Playgroud)