Shi*_*dim 7 python nlp corpus nltk
我有很多字符串如下,
ISLAMABAD: Chief Justice Iftikhar Muhammad Chaudhry said that National AccountabKARACHI, July 24 -- Police claimed to have arrested several suspects in separateALUM KULAM, Sri Lanka -- As gray-bellied clouds started to blot out the scorchin我使用NLTK删除日期行部分并识别日期,地点和人名?
使用pos标记我可以找到词性.但我需要确定位置,日期,人名.我怎样才能做到这一点?
更新:
注意:我不想执行另一个http请求.我需要使用自己的代码解析它.如果有图书馆可以使用它.
更新:
我用ne_chunk.但没有运气.
import nltk
def pchunk(t):
w_tokens = nltk.word_tokenize(t)
pt = nltk.pos_tag(w_tokens)
ne = nltk.ne_chunk(pt)
print ne
# txts is a list of those 3 sentences.
for t in txts:
print t
pchunk(t)
Run Code Online (Sandbox Code Playgroud)
输出如下,
ISLAMABAD: Chief Justice Iftikhar Muhammad Chaudhry said that National Accountab
(S
ISLAMABAD/NNP
:/:
Chief/NNP
Justice/NNP
(PERSON Iftikhar/NNP Muhammad/NNP Chaudhry/NNP)
said/VBD
that/IN
(ORGANIZATION National/NNP Accountab/NNP))
KARACHI, July 24 -- Police claimed to have arrested several suspects in separate
(S
(GPE KARACHI/NNP)
,/,
July/NNP
24/CD
--/:
Police/NNP
claimed/VBD
to/TO
have/VB
arrested/VBN
several/JJ
suspects/NNS
in/IN
separate/JJ)
ALUM KULAM, Sri Lanka -- As gray-bellied clouds started to blot out the scorchin
(S
(GPE ALUM/NN)
(ORGANIZATION KULAM/NN)
,/,
(PERSON Sri/NNP Lanka/NNP)
--/:
As/IN
gray-bellied/JJ
clouds/NNS
started/VBN
to/TO
blot/VB
out/RP
the/DT
scorchin/NN)
Run Code Online (Sandbox Code Playgroud)
仔细检查.甚至KARACHI也被很好地认可,但斯里兰卡被认为是人,而ISLAMABAD被认为是NNP而不是GPE.
| 归档时间: |
|
| 查看次数: |
3870 次 |
| 最近记录: |