Soh*_*yed 6 python data-analysis scikit-learn data-science
我是数据科学和分析的新手。在 Kaggle 上研究了很多内核之后,我制作了一个预测房产价格的模型。我已经使用我的训练数据测试了这个模型,但现在我想在我的测试数据上运行它。我有一个 test.csv 文件,我想使用它。我怎么做?我之前对训练数据集做了什么:
#loading my train dataset into python
train = pd.read_csv('/Users/sohaib/Downloads/test.csv')
#factors that will predict the price
train_pr = ['OverallQual','GrLivArea','GarageCars','TotalBsmtSF','FullBath','YearBuilt']
#set my model to DecisionTree
model = DecisionTreeRegressor()
#set prediction data to factors that will predict, and set target to SalePrice
prdata = train[train_pr]
target = train.SalePrice
#fitting model with prediction data and telling it my target
model.fit(prdata, target)
model.predict(prdata.head())
Run Code Online (Sandbox Code Playgroud)
现在我尝试做的是,复制整个代码,并将“train”更改为“test”,将“predate”更改为“testprdata”,我认为它会起作用,但遗憾的是没有。我知道我做错了什么,我不知道那是什么。
只要您以完全相同的方式处理训练数据和测试数据,该predict函数就可以在任一数据集上运行。因此,您需要fit在火车上加载训练集和测试集,或者predict仅加载测试集或同时加载训练集和测试集。
另外,请注意您正在读取的文件是test数据。假设您的文件命名正确,即使您将变量命名为train,您当前正在对测试数据进行训练。
#loading my train dataset into python
train = pd.read_csv('/Users/sohaib/Downloads/train.csv')
test = pd.read_csv('/Users/sohaib/Downloads/test.csv')
#factors that will predict the price
desired_factors = ['OverallQual','GrLivArea','GarageCars','TotalBsmtSF','FullBath','YearBuilt']
#set my model to DecisionTree
model = DecisionTreeRegressor()
#set prediction data to factors that will predict, and set target to SalePrice
train_data = train[desired_factors]
test_data = test[desired_factors]
target = train.SalePrice
#fitting model with prediction data and telling it my target
model.fit(train_data, target)
model.predict(test_data.head())
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
30895 次 |
| 最近记录: |