我正在编写代码来解决一个简单的问题,即预测库存中物品丢失的概率。
我正在使用XGBoost预测模型来做到这一点。
我将数据分成两个 .csv 文件,一个是训练数据,另一个是测试数据
这是代码:
import pandas as pd
import numpy as np
train = pd.read_csv('C:/Users/pedro/Documents/Pedro/UFMG/8o periodo/Python/Trabalho Final/train.csv', index_col='sku').fillna(-1)
test = pd.read_csv('C:/Users/pedro/Documents/Pedro/UFMG/8o periodo/Python/Trabalho Final/test.csv', index_col='sku').fillna(-1)
X_train, y_train = train.drop('isBackorder', axis=1), train['isBackorder']
import xgboost as xgb
xg_reg = xgb.XGBRegressor(objective ='reg:linear', colsample_bytree = 0.3, learning_rate = 0.1,
max_depth = 10, alpha = 10, n_estimators = 10)
xg_reg.fit(X_train,y_train)
y_pred = xg_reg.predict(test)
# Create file for the competition submission
test['isBackorder'] = y_pred
pred = test['isBackorder'].reset_index()
pred.to_csv('competitionsubmission.csv',index=False)
Run Code Online (Sandbox Code Playgroud)
这是我尝试测量问题准确性的函数(使用 RMSE …