从输入NLP句子中提取关键字的最佳方法

Dan*_*oda 6 python nlp machine-learning

我正在开展一个项目,我需要从句子中提取重要的关键字.我一直在使用基于POS标签的基于规则的系统.但是,我遇到了一些我无法解析的含糊不清的术语.是否有一些机器学习分类器可以用来根据不同句子的训练集提取相关的关键词?

err*_*ist 5

看看RAKE:这是一个非常好的小Python库.

编辑:我还找到了如何开始使用它的教程.


v.g*_*ets 5

还可以尝试这个多语言RAKE实现 - 适用于任何语言。

可以安装pip install multi-rake

from multi_rake import Rake

text_en = (
    'Compatibility of systems of linear constraints over the set of '
    'natural numbers. Criteria of compatibility of a system of linear '
    'Diophantine equations, strict inequations, and nonstrict inequations '
    'are considered. Upper bounds for components of a minimal set of '
    'solutions and algorithms of construction of minimal generating sets '
    'of solutions for all types of systems are given. These criteria and '
    'the corresponding algorithms for constructing a minimal supporting '
    'set of solutions can be used in solving all the considered types of '
    'systems and systems of mixed types.'
)

rake = Rake()

keywords = rake.apply(text_en)

print(keywords[:10])

#  ('minimal generating sets', 8.666666666666666),
#  ('linear diophantine equations', 8.5),
#  ('minimal supporting set', 7.666666666666666),
#  ('minimal set', 4.666666666666666),
#  ('linear constraints', 4.5),
#  ('natural numbers', 4.0),
#  ('strict inequations', 4.0),
#  ('nonstrict inequations', 4.0),
#  ('upper bounds', 4.0),
#  ('mixed types', 3.666666666666667)
Run Code Online (Sandbox Code Playgroud)