小编Big*_*ata的帖子

如何为 scikit 学习随机森林模型设置阈值

看到precision_recall_curve后，如果我想设置阈值=0.4，如何将0.4实现到我的随机森林模型（二元分类）中，对于任何概率<0.4，将其标记为0，对于任何>=0.4，将其标记为1。

from sklearn.ensemble import RandomForestClassifier
  random_forest = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=12)
  random_forest.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
  predicted = random_forest.predict(X_test)
accuracy = accuracy_score(y_test, predicted)

Run Code Online (Sandbox Code Playgroud)

文档精确召回

python scikit-learn

Big*_*ata

2018 04-13

9
推荐指数

1
解决办法

2万
查看次数

为什么Python和R有两种不同的标准化结果

任何人都可以在幕后解释数学吗？为什么Python和R会给我不同的结果？我应该将哪一个用于实际业务场景？

原始数据

id  cost    sales   item
1   300      50     pen
2   3        88     wf
3   1        70     gher
4   5        80     dger
5   2        999    ww

Run Code Online (Sandbox Code Playgroud)

Python代码:

import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('Scale.csv')
df[['cost', 'sales']] = StandardScaler().fit_transform(df[['cost', 'sales']])
df

Run Code Online (Sandbox Code Playgroud)

Python规范化结果

    id     cost        sales    item
0   1   1.999876    -0.559003   pen
1   2   -0.497867   -0.456582   wf
2   3   -0.514686   -0.505097   gher
3   4   -0.481047   -0.478144   dger
4   5   -0.506276   1.998826    ww

Run Code Online (Sandbox Code Playgroud)

和R代码

library(readr)
library(dplyr)
df <- read_csv("C:/Users/Ho/Desktop/Scale.csv") …

Run Code Online (Sandbox Code Playgroud)

python r normalization standardized

Big*_*ata

2018 04-05

4
推荐指数

1
解决办法

146
查看次数

标准化Python Pandas数据框中的某些列？

下面的Python代码仅向我返回一个数组，但我希望将缩放后的数据替换为原始数据。

from sklearn.preprocessing import StandardScaler
df = StandardScaler().fit_transform(df[['cost', 'sales']])
df

Run Code Online (Sandbox Code Playgroud)

输出

array([[ 1.99987622, -0.55900276],
       [-0.49786658, -0.45658181],
       [-0.5146864 , -0.505097  ],
       [-0.48104676, -0.47814412],
       [-0.50627649,  1.9988257 ]])

Run Code Online (Sandbox Code Playgroud)

原始数据

id  cost    sales   item
1   300       50    pen
2   3         88    bottle
3   1         70    drink
4   5         80    cup
5   2        999    ink

Run Code Online (Sandbox Code Playgroud)

python standardized pandas sklearn-pandas

Big*_*ata

lucky-day

2
推荐指数

1
解决办法

5458
查看次数

标签统计

python ×3

standardized ×2

normalization ×1

pandas ×1

r ×1

scikit-learn ×1

sklearn-pandas ×1

如何为 scikit 学习随机森林模型设置阈值

为什么Python和R有两种不同的标准化结果

标准化Python Pandas数据框中的某些列？

标签 统计

小编Big_ata的帖子

标签统计