为什么随机种子不能使结果在Python中保持不变

Question

为什么随机种子不能使结果在Python中保持不变

我使用以下代码。对于相同的随机种子，我希望获得相同的结果。我使用相同的随机种子（在这种情况下为1）并获得不同的结果。这是代码：

import pandas as pd
import numpy as np
from random import seed
# Load scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split
seed(1) ### <-----

file_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data'
dataset2 = pd.read_csv(file_path, header=None, sep=',')

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

#Encoding
y = le.fit_transform(dataset2[60])
dataset2[60] = y
train, test = train_test_split(dataset2, test_size=0.1)
y = train[60] 
y_test = test[60] 
clf = RandomForestClassifier(n_jobs=100, random_state=0)
features = train.columns[0:59] 
clf.fit(train[features], y)

# Apply the Classifier we trained to the test data
y_pred = clf.predict(test[features])

# Decode 
y_test_label = le.inverse_transform(y_test)
y_pred_label = le.inverse_transform(y_pred)


from sklearn.metrics import accuracy_score
print (accuracy_score(y_test_label, y_pred_label))

# Two following results:
# 0.761904761905
# 0.90476190476

Run Code Online (Sandbox Code Playgroud)

Answer 1

sas*_*cha 5

您的代码：

import numpy as np
from random import seed
seed(1) ### <-----

Run Code Online (Sandbox Code Playgroud)

设置python的random-class的随机种子。

但sklearn完全基于numpy的的随机类，如这里解释：

对于测试和可复制性，由单个种子控制整个执行过程对于具有随机成分的算法中使用的伪随机数生成器通常很重要。Scikit-learn不使用自己的全局随机状态；只要不提供RandomState实例或整数随机种子作为参数，它就依赖于numpy全局随机状态，可以使用numpy.random.seed进行设置。例如，要将执行的numpy全局随机状态设置为42，可以在其脚本中执行以下操作：

import numpy as np

np.random.seed(42)

因此，通常您应该这样做：

np.random.seed(1)

Run Code Online (Sandbox Code Playgroud)

但这只是事实的一部分，当谨慎使用所有sklearn组件时，通常不需要这样做，并使用一些种子明确地调用它们！

就像提到的ShreyasG一样，这也适用于train_test_split

归档时间：	8 年，1 月前
查看次数：	1402 次
最近记录：	8 年，1 月前