我有一个包含以下文本的json文件:
博士 戈德堡提供一切。停车场很好。他很友好而且很容易说话
如何提取带有关键字“ parking”的句子?我不需要另外两句话。
我尝试了这个:
with open("test_data.json") as f:
    for line in f:
        if "parking" in line:
            print line
它打印所有文本,而不是特定句子。
我什至尝试使用正则表达式:
f=open("test_data.json")
for line in f:
    line=line.rstrip()
    if re.search('parking',line):
        print line
即使这样也显示出相同的结果。
您可以使用nltk.tokenize:
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
f=open("test_data.json").read()
sentences=sent_tokenize(f)
my_sentence=[sent for sent in sentences if 'parking' in word_tokenize(sent)] #this gave you the all sentences that your special word is in it ! 
作为完整的方法,您可以使用函数:
>>> def sentence_finder(text,word):
...    sentences=sent_tokenize(text)
...    return [sent for sent in sentences if word in word_tokenize(sent)]
>>> s="dr. goldberg offers everything. parking is good. he's nice and easy to talk"
>>> sentence_finder(s,'parking')
['parking is good.']
| 归档时间: | 
 | 
| 查看次数: | 2943 次 | 
| 最近记录: |