XGBoost 和稀疏矩阵

Question

XGBoost 和稀疏矩阵

PLV*_*PLV 7 python numpy scipy sparse-matrix xgboost

我正在尝试使用 xgboost 在分类问题上运行 -using python - ，其中我有一个numpy 矩阵 X（行 = 观察和列 = 特征）中的数据和一个numpy 数组 y 中的标签。因为我的数据很稀疏，所以我想让它使用稀疏版本的 X 来运行，但是当发生错误时，我似乎遗漏了一些东西。

这是我所做的：

# Library import

import numpy as np
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from scipy.sparse import csr_matrix

# Converting to sparse data and running xgboost

X_csr = csr_matrix(X)
xgb1 = XGBClassifier()
xgtrain = xgb.DMatrix(X_csr, label = y )      #to work with the xgb format
xgtest = xgb.DMatrix(Xtest_csr)
xgb1.fit(xgtrain, y, eval_metric='auc')
dtrain_predictions = xgb1.predict(xgtest)

Run Code Online (Sandbox Code Playgroud)

等等...

现在尝试拟合分类器时出现错误：

File ".../xgboost/python-package/xgboost/sklearn.py", line 432, in fit
self._features_count = X.shape[1]

AttributeError: 'DMatrix' object has no attribute 'shape'

Run Code Online (Sandbox Code Playgroud)

现在，我查看了它的来源，并相信它与我希望使用的稀疏格式有关。但它是什么，以及如何修复它，我不知道。

我欢迎任何帮助或评论！非常感谢

Answer 1

小智 8

您正在使用 xgboost scikit-learn API ( http://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn )，因此您无需将数据转换为 DMatrix以适合 XGBClassifier()。只需删除该行

xgtrain = xgb.DMatrix(X_csr, label = y )

Run Code Online (Sandbox Code Playgroud)

应该管用：

type(X_csr) #scipy.sparse.csr.csr_matrix
type(y) #numpy.ndarray
xgb1 = xgb.XGBClassifier()
xgb1.fit(X_csr, y)

Run Code Online (Sandbox Code Playgroud)

输出：

XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
   gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3,
   min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
   objective='binary:logistic', reg_alpha=0, reg_lambda=1,
   scale_pos_weight=1, seed=0, silent=True, subsample=1)

Run Code Online (Sandbox Code Playgroud)

Answer 2

hpa*_*ulj 0

X_csr = csr_matrix(X)具有许多与相同的属性X，包括.shape. 但它不是一个子类，也不是替代品。代码需要具有“稀疏意识”。 sklearn符合资格；事实上，它添加了许多自己的快速稀疏实用函数。

但我不知道它xgb处理稀疏矩阵的效果如何，也不知道它如何处理sklearn.

假设问题出在xgtrain，您需要查看它的类型和属性。它与用制作的相比如何xgb.DMatrix(X, label = y )？

如果您需要非用户的帮助xgboost，则必须提供有关代码中对象的更多信息。

归档时间：	9 年，2 月前
查看次数：	13005 次
最近记录：	7 年，3 月前