shap.Explainer 构造函数错误要求未记录的位置参数

Question

shap.Explainer 构造函数错误要求未记录的位置参数

Joe*_*Joe 5 python python-3.x scikit-learn shap

我使用 pythonshap包来更好地理解我的机器学习模型。（来自文档：“SHAP（SHapley Additive exPlanations）是一种博弈论方法，用于解释任何机器学习模型的输出。”下面是我收到的错误的一个可重现的小示例：

Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import shap
>>> shap.__version__
'0.37.0'
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import LogisticRegression
>>> 
>>> iris = shap.datasets.iris()
>>> X_train, X_test, y_train, y_test = train_test_split(*iris, random_state=1)
>>> model = LogisticRegression(penalty='none', max_iter = 1000, random_state=1)
>>> model.fit(X_train, y_train)
>>> 
>>> explainer = shap.Explainer(model, data=X_train, masker=shap.maskers.Impute(),
...                            feature_names=X_train.columns, algorithm="linear")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'data'

Run Code Online (Sandbox Code Playgroud)

根据堆栈跟踪，错误似乎发生在顶级函数调用中，而不是在对的调用中Impute()。我也尝试忽略该data=部分，这会引发相同的错误。这对我来说似乎很奇怪，因为对象Explainer的文档和源代码都没有提到任何data参数（我验证它来自我正在使用的同一包版本）：

__init__(model, masker=None, link=CPUDispatcher(<function identity>), algorithm='auto', output_names=None, feature_names=None, **kwargs)

Run Code Online (Sandbox Code Playgroud)

有任何想法吗？这是一个错误，还是我错过了一些明显的东西？

Answer 1

Ser*_*nov 5

的初始化签名是Impute：

def __init__(self, data, method="linear")

Run Code Online (Sandbox Code Playgroud)

因此你的错误。所以，而不是：

explainer = shap.Explainer(model, data=X_train, masker=shap.maskers.Impute(),
                           feature_names=X_train.columns, algorithm="linear")

Run Code Online (Sandbox Code Playgroud)

你应该喂给X_trainmasker：

explainer = shap.Explainer(model, masker=shap.maskers.Impute(data=X_train),
                           feature_names=X_train.columns, algorithm="linear")

Run Code Online (Sandbox Code Playgroud)

因为它masker负责处理新 API 中的数据。

不幸的是，即使这样也行不通，因为Imputemasker暗示 feature_perturbation = "correlation_dependent"并且它似乎还没有准备好

不过，Independentmasker 运行良好：

import shap
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

iris = shap.datasets.iris()
X_train, X_test, y_train, y_test = train_test_split(*iris, random_state=1)
model = LogisticRegression(penalty="none", max_iter=1000, random_state=1)
model.fit(X_train, y_train)

masker = shap.maskers.Independent(data=X_test)

explainer = shap.Explainer(
    model, masker=masker, feature_names=X_train.columns, algorithm="linear"
)

sv = explainer(X_test)
sv.base_values[0]

Run Code Online (Sandbox Code Playgroud)

array([-5.0060995 , 13.03460398, -8.02850448])

Run Code Online (Sandbox Code Playgroud)

如果您的数据集中碰巧缺少数据，您可以根据您首选的插补策略自行插补数据，并将其提供给Independent.

归档时间：	4 年，10 月前
查看次数：	3858 次
最近记录：	4 年，10 月前