相关疑难解决方法(0)

解析文本以获得专有名词(名称和组织) - python nltk

我试图从像sms这样非常小的文本块中提取名称和组织名称中的专有名词,nltk 使用NLTK WordNet查找专有名词的基本解析器能够获得名词但问题是当我们得到专有名词时不是以大写字母开头,对于像这样的文本,像sumit这样的名字不会被认为是专有名词

>>> sentence = "i spoke with sumit and rajesh and Samit about the gridlock situation last night @ around 8 pm last nite"
>>> tagged_sent = pos_tag(sentence.split())
>>> print tagged_sent
[('i', 'PRP'), ('spoke', 'VBP'), ('with', 'IN'), **('sumit', 'NN')**, ('and', 'CC'), ('rajesh', 'JJ'), ('and', 'CC'), **('Samit', 'NNP'),** ('about', 'IN'), ('the', 'DT'), ('gridlock', 'NN'), ('situation', 'NN'), ('last', 'JJ'), ('night', 'NN'), ('@', 'IN'), ('around', 'IN'), ('8', 'CD'), ('pm', 'NN'), ('last', 'JJ'), ('nite', 'NN')]

Run Code Online (Sandbox Code Playgroud)

python nltk

Bri*_*SFT

2017 05-23

10
推荐指数

2
解决办法

1万
查看次数

nltk StanfordNERTagger:如何在没有大写的情况下获得专有名词

我正在尝试使用StanfordNERTagger和nltk从一段文本中提取关键字.

docText="John Donk works for POI. Brian Jones wants to meet with Xyz Corp. for measuring POI's Short Term performance Metrics."

words = re.split("\W+",docText) 

stops = set(stopwords.words("english"))

    #remove stop words from the list
words = [w for w in words if w not in stops and len(w) > 2]

str = " ".join(words)
print str
stn = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
stp = StanfordPOSTagger('english-bidirectional-distsim.tagger') 
stanfordPosTagList=[word for word,pos in stp.tag(str.split()) if pos == 'NNP']

print "Stanford POS Tagged"
print stanfordPosTagList
tagged = stn.tag(stanfordPosTagList)
print …

Run Code Online (Sandbox Code Playgroud)

python nlp nltk pos-tagger stanford-nlp

Abt*_*Pst

2015 12-24

5
推荐指数

1
解决办法

4225
查看次数