使用Python包spaCy,如何检测一个句子是使用被动语态还是主动语态?例如,以下句子应分别检测为使用被动语态和主动语态:
passive_sentence = "John was accused of committing crimes by David"
# passive voice "John was accused"
active_sentence = "David accused John of committing crimes"
# active voice "David accused John"
Run Code Online (Sandbox Code Playgroud)
以下解决方案采用 spaCy基于规则的匹配引擎来检测和显示句子中使用主动或被动语态的部分。没有任何方法能够正确识别 100% 的句子,尤其是那些更复杂的句子,但是,下面的解决方案可以处理绝大多数情况,并且可以改进以处理更多边缘情况。
关键组件是您提供给 的规则matcher。我将在下面解释其中一项被动语态规则——如果您理解其中一项,您应该能够理解所有其他规则,并开始构建您自己的规则以使用基于 spaCy 令牌的匹配文档来匹配特定模式。考虑以下被动语态规则:
[{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBN'}]
Run Code Online (Sandbox Code Playgroud)
该规则/模式用于matcher查找标记的顺序组合。具体来说,matcher意愿是:
OP”代表“运算符”,它定义了令牌模式的匹配频率。有关详细信息,请参阅spaCy 文档的运算符和量词小节。如果您不熟悉词性 (PoS) 标签,请参阅本教程。此外,通用依赖项 (UD)依赖项文档页面还提供了依赖项标签及其含义的深入解释。
import spacy
from spacy.matcher import Matcher
passive_sentences = [
"John was accused of committing crimes by David.",
"She was sent a cheque for a thousand euros.",
"He was given a book for his birthday.",
"He will be sent away to school.",
"The meeting was called off.",
"He was looked after by his grandmother.",
]
active_sentences = [
"David accused John of committing crimes.",
"Someone sent her a cheque for a thousand euros.",
"I gave him a book for his birthday.",
"They will send him away to school.",
"They called off the meeting.",
"His grandmother looked after him."
]
composite_sentences = [
"Three men seized me, and I was carried to the car."
]
# Load spaCy pipeline (model)
nlp = spacy.load('en_core_web_trf')
# Create pattern to match passive voice use
passive_rules = [
[{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBN'}],
[{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'VBZ'}],
[{'DEP': 'nsubjpass'}, {'DEP': 'aux', 'OP': '*'}, {'DEP': 'auxpass'}, {'TAG': 'RB'}, {'TAG': 'VBN'}],
]
# Create pattern to match active voice use
active_rules = [
[{'DEP': 'nsubj'}, {'TAG': 'VBD', 'DEP': 'ROOT'}],
[{'DEP': 'nsubj'}, {'TAG': 'VBP'}, {'TAG': 'VBG', 'OP': '!'}],
[{'DEP': 'nsubj'}, {'DEP': 'aux', 'OP': '*'}, {'TAG': 'VB'}],
[{'DEP': 'nsubj'}, {'DEP': 'aux', 'OP': '*'}, {'TAG': 'VBG'}],
[{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '*'}, {'TAG': 'VBG'}],
[{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '*'}, {'TAG': 'VBZ'}],
[{'DEP': 'nsubj'}, {'TAG': 'RB', 'OP': '+'}, {'TAG': 'VBD'}],
]
matcher = Matcher(nlp.vocab) # Init. the matcher with a vocab (note matcher vocab must share same vocab with docs)
matcher.add('Passive', passive_rules) # Add passive rules to matcher
matcher.add('Active', active_rules) # Add active rules to matcher
text = passive_sentences + active_sentences + composite_sentences # Combine various passive/active sentences
for sentence in text:
doc = nlp(sentence) # Process text with spaCy model
matches = matcher(doc) # Get matches
print("-"*40 + "\n" + sentence)
if len(matches) > 0:
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id]
span = doc[start:end] # the matched span
print("\t{}: {}".format(string_id, span.text))
else:
print("\tNo active or passive voice detected.")
Run Code Online (Sandbox Code Playgroud)
----------------------------------------
John was accused of committing crimes by David.
Passive: John was accused
----------------------------------------
She was sent a cheque for a thousand euros.
Passive: She was sent
----------------------------------------
He was given a book for his birthday.
Passive: He was given
----------------------------------------
He will be sent away to school.
Passive: He will be sent
----------------------------------------
The meeting was called off.
Passive: meeting was called
----------------------------------------
He was looked after by his grandmother
Passive: He was looked
----------------------------------------
David accused John of committing crimes.
Active: David accused
----------------------------------------
Someone sent her a cheque for a thousand euros.
Active: Someone sent
----------------------------------------
I gave him a book for his birthday.
Active: I gave
----------------------------------------
They will send him away to school.
Active: They will send
----------------------------------------
They called off the meeting.
Active: They called
----------------------------------------
His grandmother looked after him..
Active: grandmother looked
----------------------------------------
Three men seized me, and I was carried to the car.
Active: men seized
Passive: I was carried
Run Code Online (Sandbox Code Playgroud)