Python:提取包含特定单词的句子

dip*_*tra 3 python regex nltk

我有一个包含以下文本的json文件:

博士 戈德堡提供一切。停车场很好。他很友好而且很容易说话

如何提取带有关键字“ parking”的句子?我不需要另外两句话。

我尝试了这个:

with open("test_data.json") as f:
    for line in f:
        if "parking" in line:
            print line
Run Code Online (Sandbox Code Playgroud)

它打印所有文本,而不是特定句子。

我什至尝试使用正则表达式:

f=open("test_data.json")
for line in f:
    line=line.rstrip()
    if re.search('parking',line):
        print line
Run Code Online (Sandbox Code Playgroud)

即使这样也显示出相同的结果。

Kas*_*mvd 5

您可以使用nltk.tokenize

from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
f=open("test_data.json").read()
sentences=sent_tokenize(f)
my_sentence=[sent for sent in sentences if 'parking' in word_tokenize(sent)] #this gave you the all sentences that your special word is in it ! 
Run Code Online (Sandbox Code Playgroud)

作为完整的方法,您可以使用函数:

>>> def sentence_finder(text,word):
...    sentences=sent_tokenize(text)
...    return [sent for sent in sentences if word in word_tokenize(sent)]

>>> s="dr. goldberg offers everything. parking is good. he's nice and easy to talk"
>>> sentence_finder(s,'parking')
['parking is good.']
Run Code Online (Sandbox Code Playgroud)