小编Tar*_*han的帖子

从大型歌手中寻找最匹配的词

我有一个 Pandas 数据框，其中包含名为Potential Word, 的两列Fixed Word。该Potential Word列包含不同语言的单词，其中包含拼写错误的单词和正确的单词，该Fixed Word列包含对应的正确单词Potential Word。

下面我分享了一些样本数据

潜在词	固定词
例子	例子
皮波尔	人们
疙瘩	疙瘩
尤尼克	独特的

我的 vocab 数据框包含 600K 唯一行。

我的解决方案：

key = given_word
glob_match_value = 0
potential_fixed_word = ''
match_threshold = 0.65
for each in df['Potential Word']:
    match_value = match(each, key) # match is a function that returns a 
    # similarity value of two strings
    if match_value > glob_match_value and match_value > match_threshold: …

Run Code Online (Sandbox Code Playgroud)

python optimization pattern-matching string-matching dataframe

Tar*_*han

lucky-day

6
推荐指数

1
解决办法

105
查看次数

按 Pandas 中以字符串开头的索引删除行

是否可以按索引名称过滤行？我需要输出索引名称以“aa”和“wa”开头的行。

    col1   col2
b     3       3
d     4       4
fd    5       4
s     2       5
aaa   1       6
waa   4       2

Run Code Online (Sandbox Code Playgroud)

输出：

     col1   col2
aaa    1       6
waa    4       2

Run Code Online (Sandbox Code Playgroud)

pandas

Aer*_*ria

2020 07-28

5
推荐指数

2
解决办法

1618
查看次数

AttributeError: 'HTTPHeaderDict' 对象在 elasticsearch 中映射映射时没有属性 'get_all'

在这里，我添加了我的代码：


from datetime import datetime
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections

# Define a default Elasticsearch client
connections.create_connection(hosts=['localhost'])

class Article(Document):
    title = Text(analyzer='snowball', fields={'raw': Keyword()})
    body = Text(analyzer='snowball')
    tags = Keyword()
    published_from = Date()
    lines = Integer()

    class Index:
        name = 'blog'
        settings = {
          "number_of_shards": 2,
        }

    def save(self, ** kwargs):
        self.lines = len(self.body.split())
        return super(Article, self).save(** kwargs)

    def is_published(self):
        return datetime.now() >= self.published_from

# create the mappings in elasticsearch
Article.init()

Run Code Online (Sandbox Code Playgroud)

在这里，我添加了我的elasticsearch …

python elasticsearch

Tar*_*han

2020 07-19

4
推荐指数

1
解决办法

930
查看次数