我有一些Python代码通过rpy2将数据帧传递给R,然后R处理它并将生成的data.frame拉回到R作为PANDAS数据帧com.load_data.
问题是,调用com.load_data在单个Python进程中工作正常,但是当同时在多个multiprocessing.Process进程中运行同一堆代码时它会崩溃.我从Python中得到以下错误消息:
File "C:\\Python27\\lib\\site-packages\\pandas\\rpy\\common.py", line 29, in load_data
r.data(name) TypeError: 'DataFrame' object is not callable'
Run Code Online (Sandbox Code Playgroud)
所以我的问题是,是不是rpy2实际上设计为能够并行运行,或者是它只是在一个错误load_data的功能?我只是假设每个Python进程都会获得自己独立的R会话.据我所知,唯一的解决方法是让R将输出写入文本文件,相应的Python进程可以打开并继续处理.但这非常笨重.
更新一些代码:
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
import pandas as pd
import pandas.rpy.common as com
# Load C50 library into R environment
C50 = importr('C50')
...
# PANDAS data frame containing test dataset
testing = pd.DataFrame(testing)
# Pass testing dataset to R
rtesting = com.convert_to_r_dataframe(testing)
ro.globalenv['test'] = rtesting
# Strip "AsIs" from each column in the R data frame
# so that predict.C5.0 will work
for c in range(len(testing.columns)):
ro.r('''class(test[,{0}])=class(test[,{0}])[-match("AsIs", class(test[,{0}]))]'''.format(c+1))
# Make predictions on test dataset (res is pre-existing C5.0 tree)
ro.r('''preds=predict.C5.0(res, newdata=test)''')
ro.r('''preds=as.data.frame(preds)''')
# Get the predictions from R
preds = com.load_data('preds') ### Crashes here when code is run on several processes concurrently
#Further processing as necessary
...
Run Code Online (Sandbox Code Playgroud)
rpy通过并行运行Python进程和R进程,并在它们之间交换信息来工作.它没有考虑使用并行调用R调用multiprocess.所以在实践中,每个python进程都连接到同一个R进程.这可能会导致您看到的问题.
解决此问题的一种方法是在R中实现并行处理,而不是在Python中.然后,您将所有内容一次发送到R,这将并行处理它,结果将被发送回Python.