我正在使用scikit对垃圾邮件/火腿数据进行逻辑回归.X_train是我的训练数据和y_train标签('垃圾邮件'或'火腿'),我训练我的LogisticRegression:
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
Run Code Online (Sandbox Code Playgroud)
如果我想获得10倍交叉验证的准确度,我只想写:
accuracy = cross_val_score(classifier, X_train, y_train, cv=10)
Run Code Online (Sandbox Code Playgroud)
我认为通过这种方式简单地添加一个参数也可以计算精度和召回率:
precision = cross_val_score(classifier, X_train, y_train, cv=10, scoring='precision')
recall = cross_val_score(classifier, X_train, y_train, cv=10, scoring='recall')
Run Code Online (Sandbox Code Playgroud)
但它导致ValueError
:
ValueError: pos_label=1 is not a valid label: array(['ham', 'spam'], dtype='|S4')
Run Code Online (Sandbox Code Playgroud)
它与数据有关(我应该对标签进行二值化吗?)还是更改cross_val_score
功能?
先感谢您 !
python precision machine-learning scikit-learn logistic-regression
我正在尝试使用 Scrapy,现在我尝试从词源网站中提取信息:http : //www.etymonline.com 现在,我只想获取单词及其原始描述。这是 etymonline 中常见的 HTML 代码块的呈现方式:
<dt>
<a href="/index.php?term=address&allowed_in_frame=0">address (n.)</a>
<a href="http://dictionary.reference.com/search?q=address" class="dictionary" title="Look up address at Dictionary.com">
<img src="graphics/dictionary.gif" width="16" height="16" alt="Look up address at Dictionary.com" title="Look up address at Dictionary.com"/>
</a>
</dt>
<dd>
1530s, "dutiful or courteous approach," from <a href="/index.php?term=address&allowed_in_frame=0" class="crossreference">address</a> (v.) and from French <span class="foreign">adresse</span>. Sense of "formal speech" is from 1751. Sense of "superscription of a letter" is from 1712 and led to the meaning "place of residence" (1888).
</dd> …
Run Code Online (Sandbox Code Playgroud)