Python的Xgoost：ValueError（'feature_names不得包含[，]或<'）

Question

Python的Xgoost：ValueError（'feature_names不得包含[，]或<'）

sap*_*ico 4 python numpy pandas scikit-learn xgboost

Python的XGBClassifier 实现不接受字符[, ] or <'作为要素名称。

如果发生这种情况，则会引发以下情况：

ValueError（'feature_names不得包含[，]或<'）

似乎显而易见的解决方案是传递等效的numpy数组，并完全摆脱列名，但是，如果他们没有这样做，那一定是有原因的。

XGBoost对功能名称有什么用？简单地将其传递给Numpy Arrays而不是Pandas DataFrames有什么弊端？

Answer 1

Abh*_*mar 8

我知道已经晚了，但在这里为可能会遇到此问题的其他人写这个答案。这是我在遇到此问题后发现的结果：如果您的列名带有符号，通常会发生此错误[ or ] or <。这是一个例子：

import pandas as pd
import numpy as np
from xgboost.sklearn import XGBRegressor

# test input data with string, int, and symbol-included columns 
df = pd.DataFrame({'0': np.random.randint(0, 2, size=100),
                   '[test1]': np.random.uniform(0, 1, size=100),
                   'test2': np.random.uniform(0, 1, size=100),
                  3: np.random.uniform(0, 1, size=100)})

target = df.iloc[:, 0]
predictors = df.iloc[:, 1:]

# basic xgb model
xgb0 = XGBRegressor(objective= 'reg:linear')
xgb0.fit(predictors, target)

Run Code Online (Sandbox Code Playgroud)

上面的代码将引发错误：

ValueError: feature_names may not contain [, ] or <

Run Code Online (Sandbox Code Playgroud)

但是，如果您从中删除那些方括号，'[test1]'则效果很好。以下是[, ] or <从列名中删除的通用方法：

import re
import pandas as pd
import numpy as np
from xgboost.sklearn import XGBRegressor
regex = re.compile(r"\[|\]|<", re.IGNORECASE)

# test input data with string, int, and symbol-included columns 
df = pd.DataFrame({'0': np.random.randint(0, 2, size=100),
                   '[test1]': np.random.uniform(0, 1, size=100),
                   'test2': np.random.uniform(0, 1, size=100),
                  3: np.random.uniform(0, 1, size=100)})

df.columns = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in df.columns.values]

target = df.iloc[:, 0]
predictors = df.iloc[:, 1:]

# basic xgb model
xgb0 = XGBRegressor(objective= 'reg:linear')
xgb0.fit(predictors, target)

Run Code Online (Sandbox Code Playgroud)

更多阅读代码线形态xgboost core.py： xgboost / core.py。那就是检查失败，引发错误。

归档时间：	8 年前
查看次数：	3346 次
最近记录：	6 年，4 月前

Python的Xgoost：ValueError（'feature_names不得包含[，]或&lt;'）

Python的Xgoost：ValueError（'feature_names不得包含[，]或<'）