我有以下for循环:
for j in range(len(list_list_int)):
arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
Run Code Online (Sandbox Code Playgroud)
我想将其应用于foo多处理,主要是因为要花费大量时间才能完成。我尝试使用funcy的 chunks方法批量进行此操作:
for j in chunks(1000, list_list_int):
arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
Run Code Online (Sandbox Code Playgroud)
但是,我越来越list object cannot be interpreted as an integer。使用多处理应用foo的正确方法是什么?
我有以下for循环:
for j in range(len(a_nested_list_of_ints)):
arr_1_, arr_2_, arr_3_ = foo(a_nested_list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
Run Code Online (Sandbox Code Playgroud)
a_nested_list_of_ints嵌套的整数列表在哪里。但是,这需要很多时间才能完成。如何通过多处理对其进行优化?到目前为止,我尝试使用multiprocessing
p = Pool(5)
for j in range(len(a_nested_list_of_ints)):
arr_1_, arr_2_, arr_3_ = p.map(foo,a_nested_list_of_ints[j])
arr_1[j,:] = arr_1_.data.numpy()
arr_2[j,:] = arr_2_.data.numpy()
arr_3[j,:] = arr_3_.data.numpy()
Run Code Online (Sandbox Code Playgroud)
但是,我得到:
ValueError: not enough values to unpack (expected 3, got 2)
Run Code Online (Sandbox Code Playgroud)
这里:
arr_1_, arr_2_, arr_3_ = p.map(foo,a_nested_list_of_ints[j])
Run Code Online (Sandbox Code Playgroud)
是否知道如何使上述操作更快?我什至也尝试过使用starmap,但它不能正常工作。
我想通过卡方检验从文档中提取关键术语,因此我尝试了以下操作:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_selection import SelectKBest, chi2
Texts=["should schools have uniform","schools discipline","legalize marriage","marriage culture"]
vectorizer = TfidfVectorizer()
term_doc=vectorizer.fit_transform(Texts)
ch2 = SelectKBest(chi2, "all")
X_train = ch2.fit_transform(term_doc)
print (ch2.scores_)
vectorizer.get_feature_names()
Run Code Online (Sandbox Code Playgroud)
但是,我没有标签,当我运行上面的代码时,我得到:
TypeError: fit() missing 1 required positional argument: 'y'
Run Code Online (Sandbox Code Playgroud)
有没有办法使用卡方检验来提取最重要的单词而无需任何标签?
下面的列表有一些重复的子列表,元素的顺序不同:
l1 = [
['The', 'quick', 'brown', 'fox'],
['hi', 'there'],
['jumps', 'over', 'the', 'lazy', 'dog'],
['there', 'hi'],
['jumps', 'dog', 'over','lazy', 'the'],
]
Run Code Online (Sandbox Code Playgroud)
如何删除重复项,保留看到的第一个实例,以获得:
l1 = [
['The', 'quick', 'brown', 'fox'],
['hi', 'there'],
['jumps', 'over', 'the', 'lazy', 'dog'],
]
Run Code Online (Sandbox Code Playgroud)
我试过了:
[list(i) for i in set(map(tuple, l1))]
Run Code Online (Sandbox Code Playgroud)
尽管如此,我不知道这是否是大型列表的最快方法,而且我的尝试没有按预期工作。知道如何有效地删除它们吗?
我正在使用一个大型的pandas数据框,其中有几个列非常类似:
A B C D
John Tom 0 1
Homer Bart 2 3
Tom Maggie 1 4
Lisa John 5 0
Homer Bart 2 3
Lisa John 5 0
Homer Bart 2 3
Homer Bart 2 3
Tom Maggie 1 4
Run Code Online (Sandbox Code Playgroud)
如何为每个重复的行分配唯一的ID?例如:
A B C D new_id
John Tom 0 1.2 1
Homer Bart 2 3.0 2
Tom Maggie 1 4.2 3
Lisa John 5 0 4
Homer Bart 2 3 5
Lisa John 5 0 4
Homer Bart 2 …Run Code Online (Sandbox Code Playgroud) 鉴于纺织品我怎样才能更换%开头的所有代币[].例如,在以下文本文件中:
Hi how are you?
I %am %fine.
Thanks %and %you
Run Code Online (Sandbox Code Playgroud)
我怎么能附上所有的字符%以[]:
Hi how are you?
I [am] [fine].
Thanks [and] [you]
Run Code Online (Sandbox Code Playgroud)
我试图先过滤掉令牌,然后更换它们,但也许有更多的pythonic方式:
with open('../file') as f:
s = str(f.readlines())
a_list = re.sub(r'(?<=\W)[$]\S*', s.replace('.',''))
a_list= set(a_list)
print(list(a_list))
Run Code Online (Sandbox Code Playgroud) 给出一个字符串列表,说:
a = ['hey','hey how are you','good how are you','I am', 'I am fine 8998','9809 908']
Run Code Online (Sandbox Code Playgroud)
如何删除少于三个令牌的字符串?:
a = ['hey how are you','good how are you', 'I am fine 8998']
Run Code Online (Sandbox Code Playgroud)
我试过了:
' '.join(a.split(' ')[3:])
Run Code Online (Sandbox Code Playgroud)
但是,它不起作用.知道如何删除少于三个令牌的所有字符串
我在函数中返回了几个值:
def count_chars(e):
return len(e), 'bar'
Run Code Online (Sandbox Code Playgroud)
像这样:
for d in lst:
newlst = []
for x in d["data"]:
newlst.extend([x, count_chars(x)])
d["data"] = newlst
pprint(lst)
Run Code Online (Sandbox Code Playgroud)
但是,当我返回值进入元组时:
{'data': ['YES', (9, 'bar')], 'info': 'AKP'}
Run Code Online (Sandbox Code Playgroud)
我怎样才能摆脱元组?对于
{'data': ['YES', 9, 'bar'], 'info': 'AKP'}
Run Code Online (Sandbox Code Playgroud) 我有以下字符串列表:
content = [['a list with a lot of strings and chars 1'], ['a list with a lot of strings and chars 2'], ['a list with a lot of strings and chars 3'], ['a list with a lot of strings and chars 4']]
labels = ['label_1','label_2','label_3','label_4']
Run Code Online (Sandbox Code Playgroud)
如何从他们创建字典:
{
'label_1': ['a list with a lot of strings and chars 1']
'label_2': ['a list with a lot of strings and chars 2']
'label_3': ['a list with a lot of strings and chars …Run Code Online (Sandbox Code Playgroud) 阅读 gensim文档中的教程后,我不明白从训练模型生成新嵌入的正确方法是什么。到目前为止,我已经训练了 gensim 的快速文本嵌入,如下所示:
from gensim.models.fasttext import FastText as FT_gensim
model_gensim = FT_gensim(size=100)
# build the vocabulary
model_gensim.build_vocab(corpus_file=corpus_file)
# train the model
model_gensim.train(
corpus_file=corpus_file, epochs=model_gensim.epochs,
total_examples=model_gensim.corpus_count, total_words=model_gensim.corpus_total_words
)
Run Code Online (Sandbox Code Playgroud)
然后,假设我想获得与这些句子相关的嵌入向量:
sentence_obama = 'Obama speaks to the media in Illinois'.lower().split()
sentence_president = 'The president greets the press in Chicago'.lower().split()
Run Code Online (Sandbox Code Playgroud)
我怎样才能让他们得到model_gensim我以前训练过的东西?