从文本中检测被动或主动句子

sru*_*thi 4 python nlp spacy

使用Python包spaCy,如何检测一个句子是使用被动语态还是主动语态?例如,以下句子应分别检测为使用被动语态和主动语态:

passive_sentence = "John was accused of committing crimes by David"
# passive voice "John was accused"

active_sentence = "David accused John of committing crimes"
# active voice "David accused John"
Run Code Online (Sandbox Code Playgroud)

Kyl*_*erg 5

以下解决方案采用 spaCy基于规则的匹配引擎来检测和显​​示句子中使用主动或被动语态的部分。没有任何方法能够正确识别 100% 的句子,尤其是那些更复杂的句子,但是,下面的解决方案可以处理绝大多数情况,并且可以改进以处理更多边缘情况。

规则/模式匹配概述

关键组件是您提供给 的规则matcher。我将在下面解释其中一项被动语态规则——如果您理解其中一项,您应该能够理解所有其他规则,并开始构建您自己的规则以使用基于 spaCy 令牌的匹配文档来匹配特定模式。考虑以下被动语态规则:

[{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBN'}]
Run Code Online (Sandbox Code Playgroud)

该规则/模式用于matcher查找标记的顺序组合。具体来说,matcher意愿是:

  1. 查找依赖标签 (DEP) 为被动名义主语 ( nsubjpass ) 的标记。
  2. 查找 DEP 为被动辅助 (auxpass) 的令牌,其前面有零个或多个 DEP 为辅助 ( aux ) 的令牌。请注意,键“ OP”代表“运算符”,它定义了令牌模式的匹配频率。有关详细信息,请参阅spaCy 文档的运算符和量词小节。
  3. 查找词性被标记 (TAG) 为动词过去分词 ( VBN ) 的最终标记。

如果您不熟悉词性 (PoS) 标签,请参阅本教程此外,通用依赖项 (UD)依赖项文档页面还提供了依赖项标签及其含义的深入解释。

解决方案

import spacy
from spacy.matcher import Matcher

passive_sentences = [
    "John was accused of committing crimes by David.",
    "She was sent a cheque for a thousand euros.",
    "He was given a book for his birthday.",
    "He will be sent away to school.",
    "The meeting was called off.",
    "He was looked after by his grandmother.",
]
active_sentences = [
    "David accused John of committing crimes.",
    "Someone sent her a cheque for a thousand euros.",
    "I gave him a book for his birthday.",
    "They will send him away to school.",
    "They called off the meeting.",
    "His grandmother looked after him."
]
composite_sentences = [
    "Three men seized me, and I was carried to the car."
]

# Load spaCy pipeline (model)
nlp = spacy.load('en_core_web_trf')
# Create pattern to match passive voice use
passive_rules = [
    [{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBN'}],
    [{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBZ'}],
    [{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'RB'}, {'TAG': 'VBN'}],
]
# Create pattern to match active voice use
active_rules = [
    [{'DEP': 'nsubj'}, {'TAG': 'VBD', 'DEP': 'ROOT'}],
    [{'DEP': 'nsubj'}, {'TAG': 'VBP'}, {'TAG': 'VBG', 'OP': '!'}],
    [{'DEP': 'nsubj'}, {'DEP': 'aux', 'OP': '*'}, {'TAG': 'VB'}],
    [{'DEP': 'nsubj'}, {'DEP': 'aux', 'OP': '*'}, {'TAG': 'VBG'}],
    [{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '*'}, {'TAG': 'VBG'}],
    [{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '*'}, {'TAG': 'VBZ'}],
    [{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '+'}, {'TAG': 'VBD'}],
]

matcher = Matcher(nlp.vocab)  # Init. the matcher with a vocab (note matcher vocab must share same vocab with docs)
matcher.add('Passive',  passive_rules)  # Add passive rules to matcher
matcher.add('Active', active_rules)  # Add active rules to matcher
text = passive_sentences + active_sentences + composite_sentences  # Combine various passive/active sentences

for sentence in text:
    doc = nlp(sentence)  # Process text with spaCy model
    matches = matcher(doc)  # Get matches
    print("-"*40 + "\n" + sentence)
    if len(matches) > 0:
        for match_id, start, end in matches:
            string_id = nlp.vocab.strings[match_id]
            span = doc[start:end]  # the matched span
            print("\t{}: {}".format(string_id, span.text))
    else:
        print("\tNo active or passive voice detected.")
Run Code Online (Sandbox Code Playgroud)

输出

----------------------------------------
John was accused of committing crimes by David.
    Passive: John was accused
----------------------------------------
She was sent a cheque for a thousand euros.
    Passive: She was sent
----------------------------------------
He was given a book for his birthday.
    Passive: He was given
----------------------------------------
He will be sent away to school.
    Passive: He will be sent
----------------------------------------
The meeting was called off.
    Passive: meeting was called
----------------------------------------
He was looked after by his grandmother
    Passive: He was looked
----------------------------------------
David accused John of committing crimes.
    Active: David accused
----------------------------------------
Someone sent her a cheque for a thousand euros.
    Active: Someone sent
----------------------------------------
I gave him a book for his birthday.
    Active: I gave
----------------------------------------
They will send him away to school.
    Active: They will send
----------------------------------------
They called off the meeting.
    Active: They called
----------------------------------------
His grandmother looked after him..
    Active: grandmother looked
----------------------------------------
Three men seized me, and I was carried to the car.
    Active: men seized
    Passive: I was carried
Run Code Online (Sandbox Code Playgroud)