小编new*_*hon的帖子

如何用scikit学习多类案例的精确度,召回率,准确度和f1分数？

我正在处理情绪分析问题,数据看起来像这样:

label instances
    5    1190
    4     838
    3     239
    1     204
    2     127

Run Code Online (Sandbox Code Playgroud)

所以我的数据是不平衡的,因为1190 instances标有5.对于分类我使用scikit的SVC.问题是我不知道如何以正确的方式平衡我的数据,以便准确计算多类案例的精确度,召回率,准确度和f1分数.所以我尝试了以下方法:

第一:

    wclf = SVC(kernel='linear', C= 1, class_weight={1: 10})
    wclf.fit(X, y)
    weighted_prediction = wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, weighted_prediction)
print 'F1 score:', f1_score(y_test, weighted_prediction,average='weighted')
print 'Recall:', recall_score(y_test, weighted_prediction,
                              average='weighted')
print 'Precision:', precision_score(y_test, weighted_prediction,
                                    average='weighted')
print '\n clasification report:\n', classification_report(y_test, weighted_prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, weighted_prediction)

Run Code Online (Sandbox Code Playgroud)

第二:

auto_wclf = SVC(kernel='linear', C= 1, class_weight='auto')
auto_wclf.fit(X, y)
auto_weighted_prediction = auto_wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, auto_weighted_prediction)

print …

Run Code Online (Sandbox Code Playgroud)

python nlp artificial-intelligence machine-learning scikit-learn

new*_*hon

2017 03-16

99
推荐指数

4
解决办法

13万
查看次数

如何使用 python 从 xml 中高效提取 <![CDATA[]> 内容？

我有以下 xml：

<?xml version="1.0" encoding="UTF-8" standalone="no"?><author id="user23">
    <document><![CDATA["@username: That boner came at the wrong time ???? http://t.co/5X34233gDyCaCjR" HELP I'M DYING       ]]></document>
    <document><![CDATA[Ugh      ]]></document>
    <document><![CDATA[YES !!!! WE GO FOR IT. http://t.co/fiI23324E83b0Rt       ]]></document>
    <document><![CDATA[@username Shout out to me????        ]]></document>
</author>

Run Code Online (Sandbox Code Playgroud)

解析内容并将其提取<![CDATA[到]]>列表中的最有效方法是什么。比方说：

[@username: That boner came at the wrong time ???? http://t.co/5X34233gDyCaCjR" HELP I'M DYING      Ugh     YES !!!! WE GO FOR IT. http://t.co/fiI23324E83b0Rt      @username Shout out to me????       ]

Run Code Online (Sandbox Code Playgroud)

这是我尝试过的：

from bs4 import BeautifulSoup
x='/Users/user/PycharmProjects/TratandoDeMejorarPAN/test.xml'
y = BeautifulSoup(open(x), 'xml') …

Run Code Online (Sandbox Code Playgroud)

python xml lxml python-2.7 pandas

new*_*hon

2015 06-23

3
推荐指数

1
解决办法

6438
查看次数

标签统计

python ×2

artificial-intelligence ×1

lxml ×1

machine-learning ×1

nlp ×1

pandas ×1

python-2.7 ×1

scikit-learn ×1

xml ×1

如何用scikit学习多类案例的精确度,召回率,准确度和f1分数？

如何使用 python 从 xml 中高效提取 &lt;![CDATA[]&gt; 内容？

标签 统计

小编new_hon的帖子

如何使用 python 从 xml 中高效提取 <![CDATA[]> 内容？

标签统计