小编ano*_*non的帖子

下面的for循环和函数如何加速应用？

我有以下for循环：

for j in range(len(list_list_int)):
    arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
    arr_1[j,:] = arr_1_.data.numpy()
    arr_2[j,:] = arr_2_.data.numpy()
    arr_3[j,:] = arr_3_.data.numpy()

Run Code Online (Sandbox Code Playgroud)

我想将其应用于foo多处理，主要是因为要花费大量时间才能完成。我尝试使用funcy的 chunks方法批量进行此操作：

for j in chunks(1000, list_list_int):
    arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
    arr_1[j,:] = arr_1_.data.numpy()
    arr_2[j,:] = arr_2_.data.numpy()
    arr_3[j,:] = arr_3_.data.numpy()

Run Code Online (Sandbox Code Playgroud)

但是，我越来越list object cannot be interpreted as an integer。使用多处理应用foo的正确方法是什么？

iteration numpy batch-processing python-3.x funcy

ano*_*non

2019 05-20

7
推荐指数

1
解决办法

212
查看次数

如何使用多重处理来加速以下功能？

我有以下for循环：

for j in range(len(a_nested_list_of_ints)):
    arr_1_, arr_2_, arr_3_ = foo(a_nested_list_of_ints[j])
    arr_1[j,:] = arr_1_.data.numpy()
    arr_2[j,:] = arr_2_.data.numpy()
    arr_3[j,:] = arr_3_.data.numpy()

Run Code Online (Sandbox Code Playgroud)

a_nested_list_of_ints嵌套的整数列表在哪里。但是，这需要很多时间才能完成。如何通过多处理对其进行优化？到目前为止，我尝试使用multiprocessing

p = Pool(5)
for j in range(len(a_nested_list_of_ints)):
    arr_1_, arr_2_, arr_3_ = p.map(foo,a_nested_list_of_ints[j])
    arr_1[j,:] = arr_1_.data.numpy()
    arr_2[j,:] = arr_2_.data.numpy()
    arr_3[j,:] = arr_3_.data.numpy()

Run Code Online (Sandbox Code Playgroud)

但是，我得到：

ValueError: not enough values to unpack (expected 3, got 2)

Run Code Online (Sandbox Code Playgroud)

这里：

    arr_1_, arr_2_, arr_3_ = p.map(foo,a_nested_list_of_ints[j])

Run Code Online (Sandbox Code Playgroud)

是否知道如何使上述操作更快？我什至也尝试过使用starmap，但它不能正常工作。

python numpy python-3.x python-multiprocessing

ano*_*non

lucky-day

6
推荐指数

1
解决办法

225
查看次数

如何使用卡方检验计算文档中的关键术语？

我想通过卡方检验从文档中提取关键术语，因此我尝试了以下操作：

from sklearn.feature_extraction.text  import CountVectorizer
from sklearn.feature_selection import  SelectKBest, chi2

Texts=["should schools have uniform","schools discipline","legalize marriage","marriage culture"]
vectorizer = TfidfVectorizer()
term_doc=vectorizer.fit_transform(Texts)
ch2 = SelectKBest(chi2, "all")
X_train = ch2.fit_transform(term_doc)
print (ch2.scores_)
vectorizer.get_feature_names()

Run Code Online (Sandbox Code Playgroud)

但是，我没有标签，当我运行上面的代码时，我得到：

TypeError: fit() missing 1 required positional argument: 'y'

Run Code Online (Sandbox Code Playgroud)

有没有办法使用卡方检验来提取最重要的单词而无需任何标签？

python nlp machine-learning python-3.x scikit-learn

ano*_*non

lucky-day

5
推荐指数

1
解决办法

1258
查看次数

从列表列表中有效地删除重复项，与顺序无关

下面的列表有一些重复的子列表，元素的顺序不同：

l1 = [
    ['The', 'quick', 'brown', 'fox'],
    ['hi', 'there'],
    ['jumps', 'over', 'the', 'lazy', 'dog'],
    ['there', 'hi'],
    ['jumps', 'dog', 'over','lazy', 'the'],
]

Run Code Online (Sandbox Code Playgroud)

如何删除重复项，保留看到的第一个实例，以获得：

l1 = [
    ['The', 'quick', 'brown', 'fox'],
    ['hi', 'there'],
    ['jumps', 'over', 'the', 'lazy', 'dog'],
]

Run Code Online (Sandbox Code Playgroud)

我试过了：

[list(i) for i in set(map(tuple, l1))]

Run Code Online (Sandbox Code Playgroud)

尽管如此，我不知道这是否是大型列表的最快方法，而且我的尝试没有按预期工作。知道如何有效地删除它们吗？

python list-comprehension list unordered python-3.x

ano*_*non

2020 11-04

5
推荐指数

1
解决办法

150
查看次数

如何分配唯一ID来检测pandas数据帧中的重复行？

我正在使用一个大型的pandas数据框,其中有几个列非常类似:

A      B         C    D   

John   Tom       0    1
Homer  Bart      2    3
Tom    Maggie    1    4 
Lisa   John      5    0
Homer  Bart      2    3
Lisa   John      5    0
Homer  Bart      2    3
Homer  Bart      2    3
Tom    Maggie    1    4

Run Code Online (Sandbox Code Playgroud)

如何为每个重复的行分配唯一的ID？例如:

A      B         C    D      new_id

John   Tom       0    1.2      1
Homer  Bart      2    3.0      2
Tom    Maggie    1    4.2      3
Lisa   John      5    0        4
Homer  Bart      2    3        5
Lisa   John      5    0        4
Homer  Bart      2 …

Run Code Online (Sandbox Code Playgroud)

python python-3.x pandas

ano*_*non

2018 06-30

3
推荐指数

2
解决办法

1473
查看次数

如何搜索和替换文本文件中包含的字符？

鉴于纺织品我怎样才能更换%开头的所有代币[].例如,在以下文本文件中:

Hi how are you? 
I %am %fine.
Thanks %and %you

Run Code Online (Sandbox Code Playgroud)

我怎么能附上所有的字符%以[]:

Hi how are you? 
I [am] [fine].
Thanks [and] [you]

Run Code Online (Sandbox Code Playgroud)

我试图先过滤掉令牌,然后更换它们,但也许有更多的pythonic方式:

with open('../file') as f:
    s = str(f.readlines())
    a_list = re.sub(r'(?<=\W)[$]\S*', s.replace('.',''))
    a_list= set(a_list)
    print(list(a_list))

Run Code Online (Sandbox Code Playgroud)

python regex io python-3.x

ano*_*non

lucky-day

2
推荐指数

1
解决办法

55
查看次数

如何从列表中删除少于N个令牌的字符串？

给出一个字符串列表,说:

a = ['hey','hey how are you','good how are you','I am', 'I am fine 8998','9809 908']

Run Code Online (Sandbox Code Playgroud)

如何删除少于三个令牌的字符串？:

a = ['hey how are you','good how are you', 'I am fine 8998']

Run Code Online (Sandbox Code Playgroud)

我试过了:

' '.join(a.split(' ')[3:])

Run Code Online (Sandbox Code Playgroud)

但是,它不起作用.知道如何删除少于三个令牌的所有字符串

python python-3.x

ano*_*non

lucky-day

2
推荐指数

2
解决办法

42
查看次数

在函数中返回多个值时出现问题？

我在函数中返回了几个值:

def count_chars(e):
    return len(e), 'bar'

Run Code Online (Sandbox Code Playgroud)

像这样:

for d in lst:
    newlst = []
    for x in d["data"]:
        newlst.extend([x, count_chars(x)])
        d["data"] = newlst
pprint(lst)

Run Code Online (Sandbox Code Playgroud)

但是,当我返回值进入元组时:

{'data': ['YES', (9, 'bar')], 'info': 'AKP'}

Run Code Online (Sandbox Code Playgroud)

我怎样才能摆脱元组？对于

{'data': ['YES', 9, 'bar'], 'info': 'AKP'}

Run Code Online (Sandbox Code Playgroud)

python tuples python-3.x

ano*_*non

2018 11-30

1
推荐指数

1
解决办法

43
查看次数

给定两个字符串列表，如何将它们转换为dict？

我有以下字符串列表：

content = [['a list with a lot of strings and chars 1'], ['a list with a lot of strings and chars 2'], ['a list with a lot of strings and chars 3'], ['a list with a lot of strings and chars 4']]

labels = ['label_1','label_2','label_3','label_4']

Run Code Online (Sandbox Code Playgroud)

如何从他们创建字典：

{
'label_1': ['a list with a lot of strings and chars 1']
'label_2': ['a list with a lot of strings and chars 2']
'label_3': ['a list with a lot of strings and chars …

Run Code Online (Sandbox Code Playgroud)

python list-comprehension data-structures python-3.x

ano*_*non

lucky-day

1
推荐指数

1
解决办法

48
查看次数

用gensim的fasttext的wrapper训练词嵌入后，如何嵌入新的句子？

阅读 gensim文档中的教程后，我不明白从训练模型生成新嵌入的正确方法是什么。到目前为止，我已经训练了 gensim 的快速文本嵌入，如下所示：

from gensim.models.fasttext import FastText as FT_gensim

model_gensim = FT_gensim(size=100)

# build the vocabulary
model_gensim.build_vocab(corpus_file=corpus_file)

# train the model
model_gensim.train(
    corpus_file=corpus_file, epochs=model_gensim.epochs,
    total_examples=model_gensim.corpus_count, total_words=model_gensim.corpus_total_words
)

Run Code Online (Sandbox Code Playgroud)

然后，假设我想获得与这些句子相关的嵌入向量：

sentence_obama = 'Obama speaks to the media in Illinois'.lower().split()
sentence_president = 'The president greets the press in Chicago'.lower().split()

Run Code Online (Sandbox Code Playgroud)

我怎样才能让他们得到model_gensim我以前训练过的东西？

nlp machine-learning embedding gensim

ano*_*non

lucky-day

1
推荐指数

1
解决办法

1129
查看次数