如何使用 xgboost 模型对数据框中的单行进行预测

ahu*_*ura 3 python pandas xgboost

我正在将xgboost模型拟合到存储在数据框中的一些数据。拟合后,我想在数据帧的单行上运行分类器/回归器的 .predict 方法。

下面是一个最小的示例,它在整个数据帧上预测良好,但仅在数据帧的第二行上运行时崩溃。

from sklearn.datasets import load_iris
import xgboost

# Load iris data such that X is a dataframe
X, y = load_iris(return_X_y=True, as_frame=True)

clf = xgboost.XGBClassifier()
clf.fit(X, y)

# Predict for all rows - works fine
y_pred = clf.predict(X)

# Predict for single row. Crashes.
# Error: '('Expecting 2 dimensional numpy.ndarray, got: ', (4,))'
secondrow = X.iloc[1]
secondpred = clf.predict(secondrow)
Run Code Online (Sandbox Code Playgroud)

错误

from sklearn.datasets import load_iris
import xgboost

# Load iris data such that X is a dataframe
X, y = load_iris(return_X_y=True, as_frame=True)

clf = xgboost.XGBClassifier()
clf.fit(X, y)

# Predict for all rows - works fine
y_pred = clf.predict(X)

# Predict for single row. Crashes.
# Error: '('Expecting 2 dimensional numpy.ndarray, got: ', (4,))'
secondrow = X.iloc[1]
secondpred = clf.predict(secondrow)
Run Code Online (Sandbox Code Playgroud)

Tre*_*ney 6

  • predict需要基于 model 的特定形状的数组fit
  • 问题是,secondrow是一维的pandas.Series,与模型的形状不匹配。
X.iloc[1]

sepal length (cm)    4.9
sepal width (cm)     3.0
petal length (cm)    1.4
petal width (cm)     0.2
Name: 1, dtype: float64

# look at the array
X.iloc[1].values

array([4.9, 3. , 1.4, 0.2])  # note this is a 1-d array

# look at the shape
secondrow.values.shape

(4,)
Run Code Online (Sandbox Code Playgroud)
  • 您可以通过以正确的形状传递数据来查看单行,这是一个二维数组
  • 将系列选择转换为 DataFrame,并将其转置为 的正确形状.predict
secondrow = pd.DataFrame(X.iloc[1]).T

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
1                4.9               3.0                1.4               0.2

# look at secondrow as an array
secondrow.values

array([[4.9, 3. , 1.4, 0.2]])  # note this is a 2-d array

# look at the shape
secondrow.values.shape

(1, 4)

# predict
secondpred = clf.predict(secondrow)

# result
array([0])
Run Code Online (Sandbox Code Playgroud)