ath*_*_nn 6 machine-learning tf-idf mongodb
我用大量数据训练了岭分类器,这些数据用于tfidf vecotrizer矢量化数据,并且过去工作良好。但是现在我面临一个错误
'max_df对应于<min_df个文档'
数据存储在Mongodb中。
我尝试了各种解决方案,最后,当我在Mongodb中删除了只有1个文档(1条记录)的集合时,它正常工作并照常完成了培训。
但是我需要一个不需要删除记录的解决方案,因为我需要该记录。
另外,我不理解该错误,因为它仅在我的机器中运行。即使该记录存在于db中,该脚本也可以在我的系统中正常运行,该脚本在其他系统中也可以正常运行。
有人可以帮忙吗?
That error is telling you that your max_df value is less than the min_df value.
For example:
max_df = 0.7 # Removes terms with DF higher than the 70% of the documents
min_df = 5 # Terms must have DF >= 5 to be considered
Run Code Online (Sandbox Code Playgroud)
and suppose that the total number of documents in your corpus is 7, so max_df now is 0.7*7 = 4.9 and min_df still is 5, then max_df < min_df, and that should never happen because that means that 0 terms will be considered; never a term has DF lower than 4.9 and higher than 5.
| 归档时间: |
|
| 查看次数: |
4150 次 |
| 最近记录: |