use*_*200 0 python machine-learning scikit-learn
我正在尝试使用随机森林方法找到最佳特征集,我需要将数据集拆分为测试和训练。这是我的代码
from sklearn.model_selection import train_test_split
def train_test_split(x,y):
# split data train 70 % and test 30 %
x_train, x_test, y_train, y_test = train_test_split(x, y,train_size=0.3,random_state=42)
#normalization
x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min())
x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min())
train_test_split(data,data_y)
Run Code Online (Sandbox Code Playgroud)
参数 data,data_y 解析正确。但我收到以下错误。我想不通这是为什么。
您在代码中使用的函数名称与 sklearn.preprocessing 中的函数名称相同,更改函数名称即可完成这项工作。像这样的东西,
from sklearn.model_selection import train_test_split
def my_train_test_split(x,y):
# split data train 70 % and test 30 %
x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.3,random_state=42)
#normalization
x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min())
x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min())
my_train_test_split(data,data_y)
Run Code Online (Sandbox Code Playgroud)
说明:- 尽管在 python 中有方法重载(即根据参数类型选择同名函数),但在您的情况下,这两个函数都需要相同类型的参数,因此不同的命名是唯一可能的解决方案海事组织。