小编nir*_*jan的帖子

Python scikit学习多类多标签性能指标？

我为我的多类多标签输出变量运行了随机森林分类器.我得到了以下输出.

My y_test values


     Degree  Nature
762721       1       7                              
548912       0       6
727126       1      12
14880        1      12
189505       1      12
657486       1      12
461004       1       0
31548        0       6
296674       1       7
121330       0      17


predicted output :

[[  1.   7.]
 [  0.   6.]
 [  1.  12.]
 [  1.  12.]
 [  1.  12.]
 [  1.  12.]
 [  1.   0.]
 [  0.   6.]
 [  1.   7.]
 [  0.  17.]]

Run Code Online (Sandbox Code Playgroud)

现在我想检查分类器的性能.我发现对于多类多标签"Hamming loss或jaccard_similarity_score"是很好的指标.我试图计算它,但我得到了价值错误.

Error:
ValueError: multiclass-multioutput is not supported

Run Code Online (Sandbox Code Playgroud)

我尝试下面的线:

print hamming_loss(y_test, …

Run Code Online (Sandbox Code Playgroud)

python precision machine-learning scikit-learn multilabel-classification

nir*_*jan

lucky-day

6
推荐指数

1
解决办法

7615
查看次数

Python scikit学习如何在excel中导出分类报告和混淆矩阵结果？

如何将结果导出到excel文件中？我尝试了下面的脚本，但它没有给出正确的输出。在没有依赖标签的情况下预测列中测试数据集中存在的类不会在输出中显示。

有没有其他方法可以实现这一点。我想以 excel 格式显示模型结果。

import pandas as pd
expected = y_test
y_actu = pd.Series(expected, name='Actual')
y_pred = pd.Series(predicted, name='Predicted')
df_confusion = pd.crosstab(y_actu, y_pred,y_test.unique())

df_confusion


df_confusion.to_csv('SVM_Confusion_Matrix.csv')

from pandas import ExcelWriter
writer = ExcelWriter('SVM_Confusion_Matrix.xlsx')
df_confusion.to_excel(writer,'Sheet1')
writer.save()

Run Code Online (Sandbox Code Playgroud)

machine-learning python-2.7 pandas confusion-matrix scikit-learn

nir*_*jan

2016 08-18

5
推荐指数

1
解决办法

2949
查看次数

如何使用 Scala(spark) 逐行读取文本文件并使用分隔符分割并将值存储在相应的列中？

我是斯卡拉新手。

我的要求是我需要逐行读取并将其拆分为特定的分隔符并提取值以放入不同文件中的相应列中。

以下是我的输入示例数据：

ABC Log

Aug 10 14:36:52 127.0.0.1 CEF:0|McAfee|ePolicy Orchestrator|IFSSLCRT0.5.0.5/epo4.0|2410|DeploymentTask|High  eventId=34 externalId=23
Aug 10 15:45:56 127.0.0.1 CEF:0|McAfee|ePolicy Orchestrator|IFSSLCRT0.5.0.5/epo4.0|2890|DeploymentTask|Medium eventId=888 externalId=7788
Aug 10 16:40:59 127.0.0.1 CEF:0|NV|ePolicy Orchestrator|IFSSLCRT0.5.0.5/epo4.0|2990|DeploymentTask|Low eventId=989 externalId=0004


XYZ Log

Aug 15 14:32:15 142.101.36.118 cef[10612]: CEF:0|fire|cc|3.5.1|FireEye Acquisition Started
Aug 16 16:45:10 142.101.36.189 cef[10612]: CEF:0|cold|dd|3.5.4|FireEye Acquisition Started
Aug 18 19:50:20 142.101.36.190 cef[10612]: CEF:0|fire|ee|3.5.6|FireEye Acquisition Started

Run Code Online (Sandbox Code Playgroud)

在上面的数据中，我需要读取“ABC log”标题下的第一部分，并从每一行中提取值并将其放在相应的列下。这里，几个第一个值列名称是硬编码的，我需要通过拆分“=”来提取最后一列，即eventId=34 externalId=23 => col = eventId 值 = 34 且 col = 值 = externalId

Column names 

date time ip_address col1 col2 col3 col4 …

Run Code Online (Sandbox Code Playgroud)

scala apache-spark

nir*_*jan

2017 09-26

4
推荐指数

1
解决办法

3万
查看次数

标签统计

machine-learning ×2

scikit-learn ×2

apache-spark ×1

confusion-matrix ×1

multilabel-classification ×1

pandas ×1

precision ×1

python ×1

python-2.7 ×1

scala ×1

Python scikit学习多类多标签性能指标？

Python scikit学习如何在excel中导出分类报告和混淆矩阵结果？

如何使用 Scala(spark) 逐行读取文本文件并使用分隔符分割并将值存储在相应的列中？

标签 统计

小编nir_jan的帖子

标签统计