Jak*_*aur 5 python pandas scikit-learn joblib dask
我想在所有使用库的核心上执行库中Machine Learning的算法。SklearnDaskjoblib
我使用 Dask 编写的 joblib.parallel_backend 代码:
#Fire up the Joblib backend with Dask:
with joblib.parallel_backend('dask'):
model_RFE = RFE(estimator = DecisionTreeClassifier(), n_features_to_select = 5)
fit_RFE = model_RFE.fit(X_values,Y_values)
Run Code Online (Sandbox Code Playgroud)
不幸的是,当我查看我的任务管理器时,我可以看到所有工作人员都在冷漠地无所事事,只有 1 个新的 Python 任务正在完成所有工作:

即使在客户端上的 Dask 可视化中,我也看到工作人员什么也没做:
joblib我会欢迎任何其他想法。我的整个代码尝试遵循文档中的本教程:
import pandas as pd
import dask.dataframe as df
from dask.distributed import Client
import sklearn
from sklearn.feature_selection import RFE
from sklearn.tree import DecisionTreeClassifier
import joblib
#Create cluset on local PC
client = Client(n_workers = 4, threads_per_worker = 1, memory_limit = '4GB')
client
#Read data from .csv
dataframe_lazy = df.read_csv(path, engine = 'c', low_memory = False)
dataframe = dataframe_lazy.compute()
#Get my X and Y values and realse the original DF from memory
X_values = dataframe.drop(columns = ['Id', 'Target'])
Y_values = dataframe['Target']
del dataframe
#Prepare data
X_values.fillna(0, inplace = True)
#Fire up the Joblib backend with Dask:
with joblib.parallel_backend('dask'):
model_RFE = RFE(estimator = DecisionTreeClassifier(), n_features_to_select = 5)
fit_RFE = model_RFE.fit(X_values,Y_values)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2675 次 |
| 最近记录: |