预测scikit-learn分类需要运行多长时间

Question

预测scikit-learn分类需要运行多长时间

nta*_*art 24 python classification machine-learning scikit-learn

有没有办法预测根据参数和数据集从sci-kit学习运行分类器需要多长时间？我知道,非常好,对吗？

一些分类器/参数组合非常快,有些需要很长时间,以至于我最终会杀死进程.我想要一种预先估计需要多长时间的方法.

或者,我接受一些关于如何设置公共参数以减少运行时间的指针.

Answer 1

有非常特定的分类器或回归器类直接报告算法的剩余时间或进度(迭代次数等).通过将verbose=2(任何高数字> 1)选项传递给各个模型的构造函数,可以打开大部分内容.注意:此行为符合sklearn-0.14.早期版本有一些不同的详细输出(尽管仍然有用).

最好的例子是ensemble.RandomForestClassifierorsemble.GradientBoostingClassifier`,它打印到目前为止构建的树的数量和剩余时间.

clf = ensemble.GradientBoostingClassifier(verbose=3)
clf.fit(X, y)
Out:
   Iter       Train Loss   Remaining Time
     1           0.0769            0.10s
     ...

Run Code Online (Sandbox Code Playgroud)

要么

clf = ensemble.RandomForestClassifier(verbose=3)
clf.fit(X, y)
Out:
  building tree 1 of 100
  ...

Run Code Online (Sandbox Code Playgroud)

此进度信息对于估计总时间非常有用.

然后还有其他模型,如SVM,打印完成的优化迭代次数,但不直接报告剩余时间.

clf = svm.SVC(verbose=2)
clf.fit(X, y)
Out:
   *
    optimization finished, #iter = 1
    obj = -1.802585, rho = 0.000000
    nSV = 2, nBSV = 2
    ...

Run Code Online (Sandbox Code Playgroud)

据我所知,线性模型等模型不提供此类诊断信息.

检查此主题以了解更多关于详细级别含义的内容:scikit-learn适合的剩余时间

归档时间：	11 年，10 月前
查看次数：	13266 次
最近记录：	6 年，11 月前