对于下面给出的代码,如果我只使用命令,shap.plots.waterfall(shap_values[6])我会收到错误
“numpy.ndarray”对象没有属性“base_values”
我必须首先运行这两个命令:
explainer2 = shap.Explainer(clf.best_estimator_.predict, X_train)
shap_values = explainer2(X_train)
Run Code Online (Sandbox Code Playgroud)
然后运行waterfall命令以获得正确的绘图。下面是发生错误的示例:
from sklearn.datasets import make_classification
import seaborn as sns
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pickle
import joblib
import warnings
import shap
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
f, (ax1,ax2) = plt.subplots(nrows=1, ncols=2,figsize=(20,8))
# Generate noisy Data
X_train,y_train = make_classification(n_samples=1000,
n_features=50,
n_informative=9,
n_redundant=0,
n_repeated=0,
n_classes=10,
n_clusters_per_class=1,
class_sep=9,
flip_y=0.2,
#weights=[0.5,0.5],
random_state=17)
X_test,y_test = make_classification(n_samples=500,
n_features=50, …Run Code Online (Sandbox Code Playgroud) 对于下面给出的代码,我得到了不同的值条形图shap。
在此示例中,我有一个train包含 1000 个样本、9 个类别和 500 个test样本的数据集。然后,我使用随机森林作为分类器并生成模型。当我开始生成条形图时,shap我在这两种情况下得到不同的结果:
shap_values_Tree_tr = shap.TreeExplainer(clf.best_estimator_).shap_values(X_train)
shap.summary_plot(shap_values_Tree_tr, X_train)
Run Code Online (Sandbox Code Playgroud)
进而:
explainer2 = shap.Explainer(clf.best_estimator_.predict, X_test)
shap_values = explainer2(X_test)
Run Code Online (Sandbox Code Playgroud)
您能解释一下这两个图之间有什么区别以及使用哪一个吗feature importance?
这是我的代码:
from sklearn.datasets import make_classification
import seaborn as sns
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import pickle
import joblib
import warnings
import shap
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
f, (ax1,ax2) = plt.subplots(nrows=1, ncols=2,figsize=(20,8))
# Generate noisy …Run Code Online (Sandbox Code Playgroud) 我通过 python 代码生成了多类混淆矩阵:
import seaborn as sns
import matplotlib.pyplot as plt
### Confusion Matrix
from sklearn.metrics import confusion_matrix
predictions = model.predict(x_test, steps=len(x_test), verbose=0)
#y_pred=model.predict(x_test)
#y_pred = np.round(y_pred)
y_pred = np.argmax(predictions, axis=-1)
y_true=np.argmax(y_test, axis=-1)
cm = confusion_matrix(y_true, y_pred)
## Get Class Labels
labels = le.classes_
class_names = labels
# Plot confusion matrix in a beautiful manner
fig = plt.figure(figsize=(16, 14))
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax, fmt = 'g'); #annot=True to annotate cells
# labels, title and ticks
ax.set_xlabel('Predicted', fontsize=20) …Run Code Online (Sandbox Code Playgroud) 我有这些行列表:
lines = (
f"Begin Test 1",
f"Begin Test 2",
f"Begin Test 3",
f"Begin Test 4",
f"Begin Test 5",
f"Begin Test 6",
)
Run Code Online (Sandbox Code Playgroud)
我只想从包含上面显示的行的第一个文件中获取前两行,并将其连接到另一组行,然后保存新的文本文件。
下面给出的代码使用文本编辑器查看时保存了带有双引号的前两行和不带任何引号的其余行。如何去掉前两行中的双引号并正确保存数据?
这是我当前的代码:
代码
import pandas as pd
import os
import contextlib
#
#%%####################### SAVE SCORES TO CSV FILE #####################
lines = (
f"Begin Test 1",
f"Begin Test 2",
f"Begin Test 3",
f"Begin Test 4",
f"Begin Test 5",
f"Begin Test 6",
)
df = pd.DataFrame(lines)
txtfilename = f'Check1.txt'
# Remove csv file if it exists
with contextlib.suppress(FileNotFoundError): …Run Code Online (Sandbox Code Playgroud) 在此数据框中:
Feat1 Feat2 Feat3 Feat4 Labels
-46.220314 22.862856 -6.1573067 5.6060414 2
-23.80669 20.536781 -5.015675 4.2216353 2
-42.092365 25.680704 -5.0092897 5.665794 2
-35.29639 21.709473 -4.160352 5.578346 2
-37.075096 22.347767 -3.860426 5.6953945 2
-42.8849 28.03802 -7.8572545 3.3361 2
-32.3057 26.568039 -9.47018 3.4532788 2
-24.469942 27.005375 -9.301921 4.3995037 2
-97.89892 -0.38156664 6.4163384 7.234347 1
-81.96325 0.1821717 -1.2870358 4.703838 1
-78.41986 -6.766374 0.8001185 0.83444935 1
-100.68544 -4.5810957 1.6977689 1.8801615 1
-87.05412 -2.9231584 6.817379 5.4460077 1
-64.121056 -3.7892206 -0.283514 6.3084154 1
-94.504845 -0.9999217 3.2884297 6.881124 …Run Code Online (Sandbox Code Playgroud) 在下面的代码中,我正在对单元格进行数值计算.但有时,其中一个单元格(例如"LosLimit")可能包含非数字值,如"---".如何检查单元格是否为数字.
如果是数字,则进行计算,否则从"MeasValue"返回值
Sub ReturnMarginal()
'UpdatebySUPERtoolsforExcel2016
Dim xOut As Worksheet
Dim xWb As Workbook
Dim xWks As Worksheet
Dim InterSectRange As Range
Dim lowLimCol As Integer
Dim hiLimCol As Integer
Dim measCol As Integer
Application.ScreenUpdating = False
Set xWb = ActiveWorkbook
For Each xWks In xWb.Sheets
xRow = 1
With xWks
FindString = "LowLimit"
If Not xWks.Rows(1).Find(FindString) Is Nothing Then
.Cells(xRow, 16) = "Meas-LO"
.Cells(xRow, 17) = "Meas-Hi"
.Cells(xRow, 18) = "Min Value"
.Cells(xRow, 19) = "Marginal"
LastRow = .UsedRange.Rows.Count
lowLimCol …Run Code Online (Sandbox Code Playgroud)