代码如下:
import spacy
from nltk import Tree
en_nlp = spacy.load('en')
parsed = en_nlp(u"Photos under low lighting are poor, both front and back cameras.")
print(u'sentence:{0}'.format(parsed.text))
try2 = []
print(u'parsed_sentence_children::{0}'.format([(x.text,x.pos_,x.dep_,[(x.text,x.dep_) for x in list(x.children)]) for x in parsed]))
print("\n\n")
for x in parsed:
if x.pos_=="NOUN" and x.dep_=="nsubj":
print(u'Noun and noun subject:{0}'.format(try2 =[(x.text,x.pos_,x.dep_,[(x.text,x.pos_)for x in list(x.ancestors)])])
Run Code Online (Sandbox Code Playgroud)
其输出是:
[(u'Photos', u'NOUN', u'nsubj', [(u'are', u'VERB')]
现在我希望打印acomp以下子项:
[(u'are', u'VERB')]
这是以下项的祖先:
[(u'Photos', u'NOUN', u'nsubj')]
我怎样才能做到这一点?
我想从中提取名词 - 形容词对sentence.所以,基本上我想要的东西:
(Mark,sincere) (John,sincere).
from nltk import word_tokenize, pos_tag, ne_chunk
sentence = "Mark and John are sincere employees at Google."
print ne_chunk(pos_tag(word_tokenize(sentence)))
Run Code Online (Sandbox Code Playgroud) 所以,我是使用Python和NLTK的新手.我有一个名为reviews.csv的文件,其中包含从亚马逊中提取的注释.我已将此csv文件的内容标记化并将其写入名为csvfile.csv的文件中.这是代码:
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.stem import PorterStemmer
import csv #CommaSpaceVariable
from nltk.corpus import stopwords
ps = PorterStemmer()
stop_words = set(stopwords.words("english"))
with open ('reviews.csv') as csvfile:
readCSV = csv.reader(csvfile,delimiter='.')
for lines in readCSV:
word1 = word_tokenize(str(lines))
print(word1)
with open('csvfile.csv','a') as file:
for word in word1:
file.write(word)
file.write('\n')
with open ('csvfile.csv') as csvfile:
readCSV1 = csv.reader(csvfile)
for w in readCSV1:
if w not in stopwords:
print(w)
Run Code Online (Sandbox Code Playgroud)
我试图在csvfile.csv上执行词干.但我得到这个错误:
Traceback (most recent call last):<br>
File "/home/aarushi/test.py", line …Run Code Online (Sandbox Code Playgroud)