很抱歉,我无法用更简单的示例重现错误,而且我的代码太复杂而无法发布.如果我在IPython shell而不是常规Python中运行程序,那么事情就会很顺利.
我查看了之前关于这个问题的一些注意事项.它们都是由在类函数中定义的pool to call函数引起的.但对我来说情况并非如此.
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 313, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Run Code Online (Sandbox Code Playgroud)
我将不胜感激任何帮助.
更新:我挑选的功能是在模块的顶层定义的.虽然它调用包含嵌套函数的函数.即f()要求g()调用h()具有嵌套函数i(),和我打电话pool.apply_async(f).f(),g(),h()都在顶层定义.我用这个模式尝试了更简单的例子,但它确实有效.
我有以下问题
我有一个包含句子的数据框主文件,例如
master
Out[8]:
original
0 this is a nice sentence
1 this is another one
2 stackoverflow is nice
Run Code Online (Sandbox Code Playgroud)
对于Master中的每一行,我使用查找到另一个Dataframe 从站以获得最佳匹配fuzzywuzzy.我使用fuzzywuzzy,因为两个数据帧之间的匹配句子可能有点不同(额外的字符等).
例如,奴隶可能是
slave
Out[10]:
my_value name
0 2 hello world
1 1 congratulations
2 2 this is a nice sentence
3 3 this is another one
4 1 stackoverflow is nice
Run Code Online (Sandbox Code Playgroud)
这是一个功能齐全,精彩,紧凑的工作示例:)
from fuzzywuzzy import fuzz
import pandas as pd
import numpy as np
import difflib
master= pd.DataFrame({'original':['this is a nice sentence', …Run Code Online (Sandbox Code Playgroud) 我想基于a中的值创建一个类的实例pandas.DataFrame.我已经失败了.
import itertools
import multiprocessing as mp
import pandas as pd
class Toy:
id_iter = itertools.count(1)
def __init__(self, row):
self.id = self.id_iter.next()
self.type = row['type']
if __name__ == "__main__":
table = pd.DataFrame({
'type': ['a', 'b', 'c'],
'number': [5000, 4000, 30000]
})
for index, row in table.iterrows():
[Toy(row) for _ in range(row['number'])]
Run Code Online (Sandbox Code Playgroud)
我已经能够通过添加以下内容来并行化这种(某种程度):
pool = mp.Pool(processes=mp.cpu_count())
m = mp.Manager()
q = m.Queue()
for index, row in table.iterrows():
pool.apply_async([Toy(row) for _ in range(row['number'])])
Run Code Online (Sandbox Code Playgroud)
如果数字row['number']显着长于长度,这似乎会更快table.但在我的实际情况中,table …
这个问题与我之前提出的问题有关,这似乎是一个简单的问题,但是我很难找到有关多处理主题的有用信息或教程。
我的问题是我想将产生的数据合并为一个大数组,然后将其存储在我的hdf文件中。
def Simulation(i, output):
# make a simulation which outputs it resutlts in A. with shape 4000,3
A = np.array([4000,3])
output.put(A)
def handle_output(output):
hdf = pt.openFile('simulation.h5',mode='w')
hdf.createGroup('/','data')
# Here the output should be joined somehow.
# I would like to get it in the shape [4000,3,10]
output.get()
hdf.createArray('/data','array',A)
hdf.close()
if __name__ == '__main__':
output = mp.Queue()
jobs = []
proc = mp.Process(target=handle_output, args=(output, ))
proc.start()
for i in range(10):
p = mp.Process(target=Simulation, args=(i, output))
jobs.append(p)
p.start()
for …Run Code Online (Sandbox Code Playgroud)