sem*_*ing 3 python plot matplotlib jupyter-notebook shap
更新:我发现 color_bar 和 color_bar_label 参数,它们不会影响它。我还发现,如果我显示 26 个或更多特征,该栏就会出现,但会显得又小又薄,就像下面的 LoL 示例中一样。我还尝试过更改绘图的大小和特征名称之间的间距,但没有成功。
我正在努力创建 SHAP 摘要图,当该图出现时,y 轴上的垂直“特征值”颜色条根本不会出现。
力图和决策图都工作正常。我尝试更改最大功能数以查看轴是否只需要扩展,但它没有解决任何问题。我在 jupyter 笔记本中使用 python 3.9.7(因为 3.10 和我认为的一些 arches 包存在问题)和 SHAP 0.39.0。我尝试通过 conda (4.10.3) 更新/卸载/重新安装 SHAP。我什至在这里查看了 SHAP 演练,完全按照此步骤,确实出现了一个垂直特征值条,但它看起来非常小。 SHAP 测试图 作为参考,这就是演练中所说的应该的样子。
我不知道酒吧本身的名称,也不知道要更改什么才能尝试让它出现。没有错误消息或警告,它只是在我的实际用例中根本不显示,或者在示例代码中显示得非常小,我不确定要操作哪些设置来更改它。
演练的数据集来自 kaggle,此处,生成示例图的演练代码位于此处:
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split
import shap
import matplotlib.pyplot as pl
shap.initjs()
# read in the data
prefix = "local_scratch/data/league-of-legends-ranked-matches/"
matches = pd.read_csv(prefix+"matches.csv")
participants = pd.read_csv(prefix+"participants.csv")
stats1 = pd.read_csv(prefix+"stats1.csv", low_memory=False)
stats2 = pd.read_csv(prefix+"stats2.csv", low_memory=False)
stats = pd.concat([stats1,stats2])
# merge into a single DataFrame
a = pd.merge(participants, matches, left_on="matchid", right_on="id")
allstats_orig = pd.merge(a, stats, left_on="matchid", right_on="id")
allstats = allstats_orig.copy()
# drop games that lasted less than 10 minutes
allstats = allstats.loc[allstats["duration"] >= 10*60,:]
# Convert string-based categories to numeric values
cat_cols = ["role", "position", "version", "platformid"]
for c in cat_cols:
allstats[c] = allstats[c].astype('category')
allstats[c] = allstats[c].cat.codes
allstats["wardsbought"] = allstats["wardsbought"].astype(np.int32)
X = allstats.drop(["win"], axis=1)
y = allstats["win"]
# convert all features we want to consider as rates
rate_features = [
"kills", "deaths", "assists", "killingsprees", "doublekills",
"triplekills", "quadrakills", "pentakills", "legendarykills",
"totdmgdealt", "magicdmgdealt", "physicaldmgdealt", "truedmgdealt",
"totdmgtochamp", "magicdmgtochamp", "physdmgtochamp", "truedmgtochamp",
"totheal", "totunitshealed", "dmgtoobj", "timecc", "totdmgtaken",
"magicdmgtaken" , "physdmgtaken", "truedmgtaken", "goldearned", "goldspent",
"totminionskilled", "neutralminionskilled", "ownjunglekills",
"enemyjunglekills", "totcctimedealt", "pinksbought", "wardsbought",
"wardsplaced", "wardskilled"
]
for feature_name in rate_features:
X[feature_name] /= X["duration"] / 60 # per minute rate
# convert to fraction of game
X["longesttimespentliving"] /= X["duration"]
# define friendly names for the features
full_names = {
"kills": "Kills per min.",
"deaths": "Deaths per min.",
"assists": "Assists per min.",
"killingsprees": "Killing sprees per min.",
"longesttimespentliving": "Longest time living as % of game",
"doublekills": "Double kills per min.",
"triplekills": "Triple kills per min.",
"quadrakills": "Quadra kills per min.",
"pentakills": "Penta kills per min.",
"legendarykills": "Legendary kills per min.",
"totdmgdealt": "Total damage dealt per min.",
"magicdmgdealt": "Magic damage dealt per min.",
"physicaldmgdealt": "Physical damage dealt per min.",
"truedmgdealt": "True damage dealt per min.",
"totdmgtochamp": "Total damage to champions per min.",
"magicdmgtochamp": "Magic damage to champions per min.",
"physdmgtochamp": "Physical damage to champions per min.",
"truedmgtochamp": "True damage to champions per min.",
"totheal": "Total healing per min.",
"totunitshealed": "Total units healed per min.",
"dmgtoobj": "Damage to objects per min.",
"timecc": "Time spent with crown control per min.",
"totdmgtaken": "Total damage taken per min.",
"magicdmgtaken": "Magic damage taken per min.",
"physdmgtaken": "Physical damage taken per min.",
"truedmgtaken": "True damage taken per min.",
"goldearned": "Gold earned per min.",
"goldspent": "Gold spent per min.",
"totminionskilled": "Total minions killed per min.",
"neutralminionskilled": "Neutral minions killed per min.",
"ownjunglekills": "Own jungle kills per min.",
"enemyjunglekills": "Enemy jungle kills per min.",
"totcctimedealt": "Total crown control time dealt per min.",
"pinksbought": "Pink wards bought per min.",
"wardsbought": "Wards bought per min.",
"wardsplaced": "Wards placed per min.",
"turretkills": "# of turret kills",
"inhibkills": "# of inhibitor kills",
"dmgtoturrets": "Damage to turrets"
}
feature_names = [full_names.get(n, n) for n in X.columns]
X.columns = feature_names
# create train/validation split
Xt, Xv, yt, yv = train_test_split(X,y, test_size=0.2, random_state=10)
dt = xgb.DMatrix(Xt, label=yt.values)
dv = xgb.DMatrix(Xv, label=yv.values)
params = {
"eta": 0.5,
"max_depth": 4,
"objective": "binary:logistic",
"silent": 1,
"base_score": np.mean(yt),
"eval_metric": "logloss"
}
model = xgb.train(params, dt, 300, [(dt, "train"),(dv, "valid")], early_stopping_rounds=5, verbose_eval=25)
# compute the SHAP values for every prediction in the validation dataset
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(Xv)
shap.summary_plot(shap_values, Xv)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
4984 次 |
最近记录: |