小编Chi*_*Yeh的帖子

使用卡方检验在文档特征矩阵中进行特征选择

我正在使用自然语言处理进行短信挖掘.我用quantedapackage来生成文档特征矩阵(dfm).现在我想使用卡方检验进行特征选择.我知道已经有很多人问过这个问题了.但是,我找不到相关的代码.(答案只是提供了一个简短的概念,如下所示:https://stats.stackexchange.com/questions/93101/how-can-i-perform-a-chi-square-test-to-do-feature-selection- in-r)

我了解到我可以chi.squared在FSelector包中使用,但我不知道如何将此函数应用于dfm类对象(trainingtfidf如下).(在手册中显示,它适用于预测变量)

谁能给我一个提示？我很感激!

示例代码:

description <- c("From month 2 the AST and total bilirubine were not measured.", "16:OTHER - COMMENT REQUIRED IN COMMENT COLUMN;07/02/2004/GENOTYPING;SF- genotyping consent not offered until T4.",  "M6 is 13 days out of the visit window")
code <- c(4,3,6)
example <- data.frame(description, code)

library(quanteda)
trainingcorpus <- corpus(example$description)

trainingdfm <- dfm(trainingcorpus, verbose = TRUE, stem=TRUE, toLower=TRUE, removePunct= TRUE, removeSeparators=TRUE, language="english", ignoredFeatures = stopwords("english"), removeNumbers=TRUE, ngrams …

Run Code Online (Sandbox Code Playgroud)

r text-mining feature-selection quanteda fselector

Chi*_*Yeh

2017 04-13

5
推荐指数

1
解决办法

1555
查看次数