Dre*_*ana 7 python random scikit-learn random-seed
使用完全相同的种子和静态数据输入运行同一个 Python 程序两次会产生不同的结果,这怎么可能呢?在 Jupyter Notebook 中调用以下函数会产生相同的结果,但是,当我重新启动内核时,结果会有所不同。当我从命令行将代码作为 Python 脚本运行时,这同样适用。人们还采取其他措施来确保他们的代码可重现吗?我找到的所有资源都谈到了播种。随机性是由 ShapRFECV 引入的。
此代码仅在 CPU 上运行。
MWE(在此代码中,我生成一个数据集并使用 ShapRFECV 消除特征,如果这很重要):
import os, random
import numpy as np
import pandas as pd
from probatus.feature_elimination import ShapRFECV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
global_seed = 1234
os.environ['PYTHONHASHSEED'] = str(global_seed)
np.random.seed(global_seed)
random.seed(global_seed)
feature_names = ['f1', 'f2', 'f3_static', 'f4', 'f5', 'f6', 'f7',
'f8', 'f9', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17',
'f18', 'f19', 'f20']
# Code from tutorial on probatus documentation
X, y = make_classification(n_samples=100, class_sep=0.05, n_informative=6, n_features=20,
random_state=0, n_redundant=10, n_clusters_per_class=1)
X = pd.DataFrame(X, columns=feature_names)
def shap_feature_selection(X, y, seed: int) -> list[str]:
random_forest = RandomForestClassifier(random_state=seed, n_estimators=70, max_features='log2',
criterion='entropy', class_weight='balanced')
# Set to run on one thread only
shap_elimination = ShapRFECV(clf=random_forest, step=0.2, cv=5,
scoring='f1_macro', n_jobs=1, random_state=seed)
report = shap_elimination.fit_compute(X, y, check_additivity=True, seed=seed)
# Return the set of features with the best validation accuracy
return report.iloc[[report['val_metric_mean'].idxmax() - 1]]['features_set'].to_list()[0]
Run Code Online (Sandbox Code Playgroud)
结果:
# Results from the first run
shap_feature_selection(X, y, 0)
>>> ['f17', 'f15', 'f18', 'f8', 'f12', 'f1', 'f13']
# Running again in same session
shap_feature_selection(X, y, 0)
>>> ['f17', 'f15', 'f18', 'f8', 'f12', 'f1', 'f13']
# Restarting the kernel and running the exact same command
shap_feature_selection(X, y, 0)
>>> ['f8', 'f1', 'f17', 'f6', 'f18', 'f20', 'f12', 'f15', 'f7', 'f13', 'f11']
Run Code Online (Sandbox Code Playgroud)
细节:
归档时间: |
|
查看次数: |
465 次 |
最近记录: |