我试图熟悉 Scoop 库(此处的文档: https: //media.readthedocs.org/pdf/scoop/0.7/scoop.pdf)以学习如何并行执行统计计算,特别是使用 future。地图功能。
因此,首先,我想尝试运行一个简单的线性回归,并使用从正态分布随机生成的 10000000 个数据点(4 个特征,1 个目标变量)来评估串行计算和并行计算之间的性能差异。
这是我的代码:
import pandas as pd
import numpy as np
import random
from scoop import futures
import statsmodels.api as sm
from time import time
def linreg(vals):
global model
model = sm.OLS(y_vals,X_vals).fit()
return model
print(model.summary())
if __name__ == '__main__':
random.seed(42)
vals = pd.DataFrame(np.random.normal(loc = 3, scale = 100, size =(10000000,5)))
vals.columns = ['dep', 'ind1', 'ind2', 'ind3', 'ind4']
y_vals = vals['dep']
X_vals = vals[['ind1', 'ind2', 'ind3', 'ind4']]
bt = time()
model_vals …Run Code Online (Sandbox Code Playgroud) python parallel-processing machine-learning linear-regression python-scoop