Python：提取包含特定单词的句子

Question

Python：提取包含特定单词的句子

我有一个包含以下文本的json文件：

博士戈德堡提供一切。停车场很好。他很友好而且很容易说话

如何提取带有关键字“ parking”的句子？我不需要另外两句话。

我尝试了这个：

with open("test_data.json") as f:
    for line in f:
        if "parking" in line:
            print line

Run Code Online (Sandbox Code Playgroud)

它打印所有文本，而不是特定句子。

我什至尝试使用正则表达式：

f=open("test_data.json")
for line in f:
    line=line.rstrip()
    if re.search('parking',line):
        print line

Run Code Online (Sandbox Code Playgroud)

即使这样也显示出相同的结果。

Answer 1

Kas*_*mvd 5

您可以使用nltk.tokenize：

from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
f=open("test_data.json").read()
sentences=sent_tokenize(f)
my_sentence=[sent for sent in sentences if 'parking' in word_tokenize(sent)] #this gave you the all sentences that your special word is in it !

Run Code Online (Sandbox Code Playgroud)

作为完整的方法，您可以使用函数：

>>> def sentence_finder(text,word):
...    sentences=sent_tokenize(text)
...    return [sent for sent in sentences if word in word_tokenize(sent)]

>>> s="dr. goldberg offers everything. parking is good. he's nice and easy to talk"
>>> sentence_finder(s,'parking')
['parking is good.']

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，11 月前
查看次数：	2943 次
最近记录：	9 年，11 月前