Spacy 英语语言模型加载时间太长

Question

Spacy 英语语言模型加载时间太长

She*_*eri 2 python named-entity-recognition chatbot spacy

我正在尝试使用 python 制作一个聊天机器人，为此我使用 Spacy 进行实体识别，因此我安装了预构建的 Spacy 英语语言模型（中）来从用户话语中提取实体，但问题是当我加载模型时要从用户话语中提取实体，加载模型需要 31 秒，因为我正在使聊天机器人时间对我来说非常重要。需要大家的指导，还有其他选择吗？任何帮助将非常感激

以下是从用户话语中提取实体的代码：

import spacy
import time
def extractEntity(userUtterance):
    ''' This funtion returns a list of tuple a tuple contain 
        (entity Name, Entity Type)    
        We use pre build spacy english language model to extract entities
    '''
    start_time = time.process_time()
    nlp = spacy.load("en")
    print(time.process_time() - start_time, "seconds") # prints the time taken to load the model
    docx = nlp(userUtterance)
    listOfTyples = [(word.text, spacy.explain(word.label_)) for word in docx.ents]
    return listOfTyples

if __name__ == "__main__":
    print(extractEntity("I want to go to London, can you book my flight for wednesday"))

Run Code Online (Sandbox Code Playgroud)

输出：

31.0 seconds
[('London', 'Countries, cities, states'), ('wednesday', 'Absolute or relative dates or periods')]

Run Code Online (Sandbox Code Playgroud)

Answer 1

pol*_*m23 5

这真的很慢，因为它为每个句子加载模型：

import spacy

def dostuff(text):
    nlp = spacy.load("en")
    return nlp(text)

Run Code Online (Sandbox Code Playgroud)

这并不慢，因为它加载模型一次并在每个函数调用中重复使用它：

import spacy

nlp = spacy.load("en")

def dostuff(text):
    return nlp(text)

Run Code Online (Sandbox Code Playgroud)

您应该将您的应用程序更改为类似于第二个示例。这并非特定于 spaCy，而是适用于您选择使用的任何类型的模型。

好建议！此外，我想指出使用“nlp.pipe(texts)”来处理一批“文本”。无论您在脚本中的何处可以像这样批量输入，您都应该看到性能提升。另请参阅 https://spacy.io/api/language#pipe (3认同)

归档时间：	6 年，1 月前
查看次数：	2126 次
最近记录：	6 年，1 月前