我正在寻找一个Java库来从一个文本块中提取关键字.
该过程应如下:
停止单词清理 - >词干 - >根据英语语言学统计信息搜索关键词 - 这意味着如果一个单词在文本中出现的次数多于在英语中出现的概率而不是关键词候选词.
是否有执行此任务的库?
在过去的几个小时里,我一直在寻找SO上的nlp标签,我相信我没有错过任何东西,但如果我这样做,请指出我的问题.
但与此同时,我将描述我想要做的事情.我在许多帖子中观察到的一个常见概念是语义相似性很难.例如,从这篇文章中,接受的解决方案建议如下:
First of all, neither from the perspective of computational
linguistics nor of theoretical linguistics is it clear what
the term 'semantic similarity' means exactly. ....
Consider these examples:
Pete and Rob have found a dog near the station.
Pete and Rob have never found a dog near the station.
Pete and Rob both like programming a lot.
Patricia found a dog near the station.
It was a dog who found Pete and Rob under the snow. …Run Code Online (Sandbox Code Playgroud)