标签: fselector

使用卡方检验在文档特征矩阵中进行特征选择

我正在使用自然语言处理进行短信挖掘.我用quantedapackage来生成文档特征矩阵(dfm).现在我想使用卡方检验进行特征选择.我知道已经有很多人问过这个问题了.但是,我找不到相关的代码.(答案只是提供了一个简短的概念,如下所示:https://stats.stackexchange.com/questions/93101/how-can-i-perform-a-chi-square-test-to-do-feature-selection- in-r)

我了解到我可以chi.squared在FSelector包中使用,但我不知道如何将此函数应用于dfm类对象(trainingtfidf如下).(在手册中显示,它适用于预测变量)

谁能给我一个提示？我很感激!

示例代码:

description <- c("From month 2 the AST and total bilirubine were not measured.", "16:OTHER - COMMENT REQUIRED IN COMMENT COLUMN;07/02/2004/GENOTYPING;SF- genotyping consent not offered until T4.",  "M6 is 13 days out of the visit window")
code <- c(4,3,6)
example <- data.frame(description, code)

library(quanteda)
trainingcorpus <- corpus(example$description)

trainingdfm <- dfm(trainingcorpus, verbose = TRUE, stem=TRUE, toLower=TRUE, removePunct= TRUE, removeSeparators=TRUE, language="english", ignoredFeatures = stopwords("english"), removeNumbers=TRUE, ngrams …

Run Code Online (Sandbox Code Playgroud)

r text-mining feature-selection quanteda fselector

Chi*_*Yeh

2017 04-13

5
推荐指数

1
解决办法

1555
查看次数

"rpart"对象错误的预测无效

我正在使用这个CRAN文档(https://cran.r-project.org/web/packages/FSelector/FSelector.pdf)第4页的最佳首次搜索的确切代码,该文档使用了iris数据集.它在虹膜数据集上运行得很好,但不适用于我的ndata.我的数据有37个预测变量(数字和分类),第38列是类预测.

我收到错误:

Error in predict.rpart(tree, test, type = "c") : 
   Invalid prediction for "rpart" object

Run Code Online (Sandbox Code Playgroud)

我认为这个来自这一行:

     error.rate = sum(test$Class != predict(tree, test, type="c")) / nrow(test)

Run Code Online (Sandbox Code Playgroud)

我已经尝试过调试和回溯,但我不明白为什么会发生这种错误(就像我说的那样,它不能用虹膜数据重现).

这是我的一些数据,因此您可以看到我正在使用的内容:

> head(data)
Numeric Binary Binary.1 Categorical Binary.2 Numeric.1 Numeric.2 Numeric.3     Numeric.4 Numeric.5 Numeric.6
1      42      1        0           1        0  27.38953  38.93202  27.09122  38.15687  9.798653  18.57313
2      43      1        0           3        0  76.34071  75.18190  73.66722  72.39449 23.546124  54.29957
3      67      0        0           1        0 485.87158 287.35052 471.58863 281.55261 73.454080 389.40092
4      49 …

Run Code Online (Sandbox Code Playgroud)

r prediction rpart fselector

Ash*_*mes

2015 11-18

0
推荐指数

1
解决办法

9896
查看次数