具有上下文和内存的langchain

Chr*_*ter 5 langchain py-langchain

我正在尝试修改现有的 Colab 示例以结合 langchain 内存和上下文文档加载。在两个单独的测试中,每个实例都完美运行。现在我想将两者(训练上下文加载和对话记忆)合二为一 - 这样我就可以加载之前训练的数据,并在我的聊天机器人中保存对话历史记录。问题是我不知道如何使用“ConversationChain”来实现这一点,它只需要一个参数,即“输入”。

当我使用“ConversationChain”时,我可以传递以下内容: query = "What is the title of the document?" docs = docsearch.similarity_search(query) chain.run(input_documents=docs, question=query)

有人能指出我正确的方向吗?

我正在使用这里的内存示例:https ://www.pinecone.io/learn/langchain-conversational-memory/

我对 Python 和 langchain 的了解有限。

我试过:

    with open('/content/gdrive/My Drive/ai-data/docsearch.pkl', 'rb') as f:
        docsearch = pickle.load(f)
  
    model_kwargs = {"model": "text-davinci-003", "temperature": 0.7, "max_tokens": -1, "top_p": 1, "frequency_penalty": 0, "presence_penalty": 0.5, "n": 1, "best_of": 1}

    llm = OpenAI(model_kwargs=model_kwargs)
    
    def count_tokens(chain, query):
    with get_openai_callback() as cb:
        docs = docsearch.similarity_search(query)
        # working older version: result = chain.run(query)
        result = chain.run(input_documents=docs, question=query)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result
    
    conversation_bufw = ConversationChain(
        llm=llm, 
        memory=ConversationBufferWindowMemory(k=5)
    )
    
    count_tokens(
        conversation_bufw, 
        "Good morning AI!"
    )
Run Code Online (Sandbox Code Playgroud)

and*_*ece 2

我想你想要一个ConversationalRetrievalChain. 这种链允许对话记忆并从输入文档中提取信息。

以下是一个玩具文档集的示例(使用临时 Chroma DB 矢量存储):

使用 Pandas 和 的示例数据集DataFrameLoader

import pandas as pd

from langchain.document_loaders import DataFrameLoader
from langchain.llms import OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

data = {
    'index': ['001', '002', '003'], 
    'text': [
        'title: cat friend\ni like cats and the color blue.', 
        'title: dog friend\ni like dogs and the smell of rain.', 
        'title: bird friend\ni like birds and the feel of sunshine.'
    ]
}

df = pd.DataFrame(data)
loader = DataFrameLoader(df, page_content_column="text")
docs = loader.load()
Run Code Online (Sandbox Code Playgroud)

现在获取嵌入并存储在 Chroma 中(注意:您需要 OpenAI API 令牌才能运行此代码)

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(docs, embeddings)
Run Code Online (Sandbox Code Playgroud)

现在创建内存缓冲区并初始化链:

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

qa = ConversationalRetrievalChain.from_llm(
    OpenAI(temperature=0.8), 
    vectorstore.as_retriever(search_kwargs={"k": 3}),
    memory=memory
)
Run Code Online (Sandbox Code Playgroud)

现在您可以开始聊天了:

q_1 = "What are all of the document titles?"
result = qa({"question": q_1})

result
{'question': 'What are all of the document titles?',
 'chat_history': [HumanMessage(content='What are all of the document titles?', additional_kwargs={}),
  AIMessage(content=' The document titles are "bird friend", "cat friend", and "dog friend".', additional_kwargs={})],
 'answer': ' The document titles are "bird friend", "cat friend", and "dog friend".'}
Run Code Online (Sandbox Code Playgroud)
q_2 = ("Do any documents mention a color?")
result = qa({"question": q_2})

result
{'question': 'Do any documents mention a color?',
 'chat_history': [HumanMessage(content='What are all of the document titles?', additional_kwargs={}),
  AIMessage(content=' The document titles are "bird friend", "cat friend", and "dog friend".', additional_kwargs={}),
  HumanMessage(content='Do any documents mention a color?', additional_kwargs={}),
  AIMessage(content=' Yes, the document titled "cat friend" mentions the color blue.', additional_kwargs={})],
 'answer': ' Yes, the document titled "cat friend" mentions the color blue.'}
Run Code Online (Sandbox Code Playgroud)