看到precision_recall_curve后,如果我想设置阈值=0.4,如何将0.4实现到我的随机森林模型(二元分类)中,对于任何概率<0.4,将其标记为0,对于任何>=0.4,将其标记为1。
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=12)
random_forest.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
predicted = random_forest.predict(X_test)
accuracy = accuracy_score(y_test, predicted)
Run Code Online (Sandbox Code Playgroud)
文档精确召回
任何人都可以在幕后解释数学吗?为什么Python和R会给我不同的结果?我应该将哪一个用于实际业务场景?
原始数据
id cost sales item
1 300 50 pen
2 3 88 wf
3 1 70 gher
4 5 80 dger
5 2 999 ww
Run Code Online (Sandbox Code Playgroud)
Python代码:
import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('Scale.csv')
df[['cost', 'sales']] = StandardScaler().fit_transform(df[['cost', 'sales']])
df
Run Code Online (Sandbox Code Playgroud)
Python规范化结果
id cost sales item
0 1 1.999876 -0.559003 pen
1 2 -0.497867 -0.456582 wf
2 3 -0.514686 -0.505097 gher
3 4 -0.481047 -0.478144 dger
4 5 -0.506276 1.998826 ww
Run Code Online (Sandbox Code Playgroud)
和R代码
library(readr)
library(dplyr)
df <- read_csv("C:/Users/Ho/Desktop/Scale.csv") …Run Code Online (Sandbox Code Playgroud) 下面的Python代码仅向我返回一个数组,但我希望将缩放后的数据替换为原始数据。
from sklearn.preprocessing import StandardScaler
df = StandardScaler().fit_transform(df[['cost', 'sales']])
df
Run Code Online (Sandbox Code Playgroud)
输出
array([[ 1.99987622, -0.55900276],
[-0.49786658, -0.45658181],
[-0.5146864 , -0.505097 ],
[-0.48104676, -0.47814412],
[-0.50627649, 1.9988257 ]])
Run Code Online (Sandbox Code Playgroud)
原始数据
id cost sales item
1 300 50 pen
2 3 88 bottle
3 1 70 drink
4 5 80 cup
5 2 999 ink
Run Code Online (Sandbox Code Playgroud)