小编CAB*_*CAB的帖子

使用 pandas .map 更改值

我正在尝试使用地图函数更改数据中的字符串的数值。

这是数据：

    label   sms_message
0   ham     Go until jurong point, crazy.. Available only ...
1   ham     Ok lar... Joking wif u oni...
2   spam    Free entry in 2 a wkly comp to win FA Cup fina...
3   ham     U dun say so early hor... U c already then say...
4   ham     Nah I don't think he goes to usf, he lives aro...

Run Code Online (Sandbox Code Playgroud)

我正在尝试使用以下命令将“垃圾邮件”更改为 1，将“火腿”更改为 0：

df['label'] = df.label.map({'ham':0, 'spam':1})

Run Code Online (Sandbox Code Playgroud)

但结果是：

    label   sms_message
0   NaN     Go until jurong point, crazy.. Available …

Run Code Online (Sandbox Code Playgroud)

python dictionary pandas

CAB*_*CAB

2018 12-09

2
推荐指数

1
解决办法

1万
查看次数

通过 Langchain 获取信息源

我正在使用 langchain 库将我公司的信息保存在矢量数据库中，当我查询信息时，结果很好，但也需要一种方法来恢复信息的来源 - 例如来源：“www.site.txt”。 com/about”或至少“文档 156”。你们有人知道该怎么做吗？

编辑：目前，我正在使用docsearch.similarity_search(query)，只返回 page_content，但元数据为空

我正在吸收这段代码，但我完全愿意改变。

db = ElasticVectorSearch.from_documents(
        documents,
        embeddings,
        elasticsearch_url="http://localhost:9200",
        index_name="elastic-index",
    )

Run Code Online (Sandbox Code Playgroud)

information-retrieval langchain

CAB*_*CAB

2023 05-31

2
推荐指数

1
解决办法

8895
查看次数

将函数应用于 pandas 系列的每个元素

我正在尝试标记我的pandas系列中的每个句子。我尝试按照文档中的说明使用 apply 进行操作，但没有成功：

x.apply(nltk.word_tokenize)

Run Code Online (Sandbox Code Playgroud)

如果我只是使用nltk.word_tokenize(x)也不起作用，因为x不是字符串。有人有什么想法吗？

编辑：x是一系列pandas句子：

0       A very, very, very slow-moving, aimless movie ...
1       Not sure who was more lost - the flat characte...
2       Attempting artiness with black & white and cle...

Run Code Online (Sandbox Code Playgroud)

与x.apply(nltk.word_tokenize)它返回完全相同：

0       A very, very, very slow-moving, aimless movie ...
1       Not sure who was more lost - the flat characte...
2       Attempting artiness with black & white and cle...

Run Code Online (Sandbox Code Playgroud)

错误nltk.word_tokenize(x)是： …

python nltk python-3.x pandas

CAB*_*CAB

2018 08-26

-2
推荐指数

1
解决办法

3490
查看次数