识别句子中是否包含命令的方法

Question

识别句子中是否包含命令的方法

试图找出一个句子中是否包含祈使语（例如，将“单击下面”归为祈使语，而将“这里有一些信息”归为否）。

例如斯坦福解析器，这可能吗？作为参考，主站点（http://nlp.stanford.edu/software/lex-parser.shtml）指示“对命令的识别有所改进”，但是从属手册未提供 http：// nlp的要求。 stanford.edu/software/dependencies_manual.pdf）

另外，还有另一种可行的方法吗？

Answer 1

nis*_*chi 6

我也没有找到任何（直接）解决“强制检测”的图书馆或文献（必须有一个不同的官方名称......）。这是我通过阅读命令式语法、学习组块和一些实验得出的结论。

（Python + NLTK）

from nltk import RegexpParser
from nltk.tree import Tree

def is_imperative(tagged_sent):
    # if the sentence is not a question...
    if tagged_sent[-1][0] != "?":
        # catches simple imperatives, e.g. "Open the pod bay doors, HAL!"
        if tagged_sent[0][1] == "VB" or tagged_sent[0][1] == "MD":
            return True

        # catches imperative sentences starting with words like 'please', 'you',...
        # E.g. "Dave, stop.", "Just take a stress pill and think things over."
        else:
            chunk = get_chunks(tagged_sent)
            # check if the first chunk of the sentence is a VB-Phrase
            if type(chunk[0]) is Tree and chunk[0].label() == "VB-Phrase":
                return True

    # Questions can be imperatives too, let's check if this one is
    else:
        # check if sentence contains the word 'please'
        pls = len([w for w in tagged_sent if w[0].lower() == "please"]) > 0
        # catches requests disguised as questions
        # e.g. "Open the doors, HAL, please?"
        if pls and (tagged_sent[0][1] == "VB" or tagged_sent[0][1] == "MD"):
            return True

        chunk = get_chunks(tagged_sent)
        # catches imperatives ending with a Question tag
        # and starting with a verb in base form, e.g. "Stop it, will you?"
        elif type(chunk[-1]) is Tree and chunk[-1].label() == "Q-Tag":
            if (chunk[0][1] == "VB" or
                (type(chunk[0]) is Tree and chunk[0].label() == "VB-Phrase")):
                return True

    return False

# chunks the sentence into grammatical phrases based on its POS-tags
def get_chunks(tagged_sent):
    chunkgram = r"""VB-Phrase: {<DT><,>*<VB>}
                    VB-Phrase: {<RB><VB>}
                    VB-Phrase: {<UH><,>*<VB>}
                    VB-Phrase: {<UH><,><VBP>}
                    VB-Phrase: {<PRP><VB>}
                    VB-Phrase: {<NN.?>+<,>*<VB>}
                    Q-Tag: {<,><MD><RB>*<PRP><.>*}"""
    chunkparser = RegexpParser(chunkgram)
    return chunkparser.parse(tagged_sent)

Run Code Online (Sandbox Code Playgroud)

尚未测试该算法的性能，但根据我的观察，我认为精度可能优于召回率。请注意，性能在很大程度上取决于 POS 标签的正确性。

归档时间：	10 年，6 月前
查看次数：	999 次
最近记录：	8 年，2 月前