使用基于训练数据集的模型预测测试数据？

Question

使用基于训练数据集的模型预测测试数据？

Soh*_*yed 6 python data-analysis scikit-learn data-science

我是数据科学和分析的新手。在 Kaggle 上研究了很多内核之后，我制作了一个预测房产价格的模型。我已经使用我的训练数据测试了这个模型，但现在我想在我的测试数据上运行它。我有一个 test.csv 文件，我想使用它。我怎么做？我之前对训练数据集做了什么：

#loading my train dataset into python
train = pd.read_csv('/Users/sohaib/Downloads/test.csv')

#factors that will predict the price
train_pr = ['OverallQual','GrLivArea','GarageCars','TotalBsmtSF','FullBath','YearBuilt']

#set my model to DecisionTree
model = DecisionTreeRegressor()

#set prediction data to factors that will predict, and set target to SalePrice
prdata = train[train_pr]
target = train.SalePrice

#fitting model with prediction data and telling it my target
model.fit(prdata, target)

model.predict(prdata.head())

Run Code Online (Sandbox Code Playgroud)

现在我尝试做的是，复制整个代码，并将“train”更改为“test”，将“predate”更改为“testprdata”，我认为它会起作用，但遗憾的是没有。我知道我做错了什么，我不知道那是什么。

Answer 1

The*_*ist 4

只要您以完全相同的方式处理训练数据和测试数据，该predict函数就可以在任一数据集上运行。因此，您需要fit在火车上加载训练集和测试集，或者predict仅加载测试集或同时加载训练集和测试集。

另外，请注意您正在读取的文件是test数据。假设您的文件命名正确，即使您将变量命名为train，您当前正在对测试数据进行训练。

#loading my train dataset into python
train = pd.read_csv('/Users/sohaib/Downloads/train.csv')
test = pd.read_csv('/Users/sohaib/Downloads/test.csv')

#factors that will predict the price
desired_factors = ['OverallQual','GrLivArea','GarageCars','TotalBsmtSF','FullBath','YearBuilt']

#set my model to DecisionTree
model = DecisionTreeRegressor()

#set prediction data to factors that will predict, and set target to SalePrice
train_data = train[desired_factors]
test_data = test[desired_factors]
target = train.SalePrice

#fitting model with prediction data and telling it my target
model.fit(train_data, target)

model.predict(test_data.head())

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，6 月前
查看次数：	30895 次
最近记录：	8 年，6 月前