我正在处理情绪分析问题,数据看起来像这样:
label instances
5 1190
4 838
3 239
1 204
2 127
Run Code Online (Sandbox Code Playgroud)
所以我的数据是不平衡的,因为1190 instances标有5.对于分类我使用scikit的SVC.问题是我不知道如何以正确的方式平衡我的数据,以便准确计算多类案例的精确度,召回率,准确度和f1分数.所以我尝试了以下方法:
第一:
wclf = SVC(kernel='linear', C= 1, class_weight={1: 10})
wclf.fit(X, y)
weighted_prediction = wclf.predict(X_test)
print 'Accuracy:', accuracy_score(y_test, weighted_prediction)
print 'F1 score:', f1_score(y_test, weighted_prediction,average='weighted')
print 'Recall:', recall_score(y_test, weighted_prediction,
average='weighted')
print 'Precision:', precision_score(y_test, weighted_prediction,
average='weighted')
print '\n clasification report:\n', classification_report(y_test, weighted_prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, weighted_prediction)
Run Code Online (Sandbox Code Playgroud)
第二:
auto_wclf = SVC(kernel='linear', C= 1, class_weight='auto')
auto_wclf.fit(X, y)
auto_weighted_prediction = auto_wclf.predict(X_test)
print 'Accuracy:', accuracy_score(y_test, auto_weighted_prediction)
print …Run Code Online (Sandbox Code Playgroud) python nlp artificial-intelligence machine-learning scikit-learn
我有以下 xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?><author id="user23">
<document><![CDATA["@username: That boner came at the wrong time ???? http://t.co/5X34233gDyCaCjR" HELP I'M DYING ]]></document>
<document><![CDATA[Ugh ]]></document>
<document><![CDATA[YES !!!! WE GO FOR IT. http://t.co/fiI23324E83b0Rt ]]></document>
<document><![CDATA[@username Shout out to me???? ]]></document>
</author>
Run Code Online (Sandbox Code Playgroud)
解析内容并将其提取<![CDATA[到]]>列表中的最有效方法是什么。比方说:
[@username: That boner came at the wrong time ???? http://t.co/5X34233gDyCaCjR" HELP I'M DYING Ugh YES !!!! WE GO FOR IT. http://t.co/fiI23324E83b0Rt @username Shout out to me???? ]
Run Code Online (Sandbox Code Playgroud)
这是我尝试过的:
from bs4 import BeautifulSoup
x='/Users/user/PycharmProjects/TratandoDeMejorarPAN/test.xml'
y = BeautifulSoup(open(x), 'xml') …Run Code Online (Sandbox Code Playgroud)