小编use*_*273的帖子

sklearn 中 GridSearchCV 的自定义“Precision at k”评分对象

我目前正在尝试使用 scikit-learn 中的 GridSearchCV 使用“k 处的精度”评分指标来调整超参数，如果我将分类器分数的前 k 个百分位数分类为正类，该指标将为我提供精度。我知道可以使用 make_scorer 创建自定义评分器并创建评分函数。这就是我现在所拥有的：

from sklearn import metrics
from sklearn.grid_search import GridSearchCV
from sklearn.linear_model import LogisticRegression

def precision_at_k(y_true, y_score, k):
    df = pd.DataFrame({'true': y_true, 'score': y_score}).sort('score')
    threshold = df.iloc[int(k*len(df)),1]
    y_pred = pd.Series([1 if i >= threshold else 0 for i in df['score']])
    return metrics.precision_score(y_true, y_pred)

custom_scorer = metrics.make_scorer(precision_at_k, needs_proba=True, k=0.1)

X = np.random.randn(100, 10)
Y = np.random.binomial(1, 0.3, 100)

train_index = range(0, 70)
test_index = range(70, 100)
train_x = X[train_index]
train_Y = Y[train_index]
test_x …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn cross-validation grid-search

use*_*273

lucky-day

5
推荐指数

1
解决办法

4427
查看次数

read.table()错误,即使所有元素都存在

我在read.table()中遇到错误:

data <- read.table(file, header=T, stringsAsFactors=F, sep="@")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 160 did not have 28 elements

Run Code Online (Sandbox Code Playgroud)

我检查了第160行,它确实有28个元素(它有27个@符号).

我检查了所有的30242行有816534 @符号,这是每行27,所以我很确定每一行都有28个元素.我还检查了文件以确认除了作为分隔符之外的其他地方没有@符号.

有没有人知道这里发生了什么？

编辑:文件的第160行

158 @精神状态:1.总体临床症状@ MD @ S @ 2002 @ CMP-005 @ 02 @ 20.67 @ 23.58 @氯氮平与精神分裂症的典型精神抑制药物@ IV @ 4.47 @ 02 @SENSITIVITY ANALYSIS - CHINESE TRIALS @ CD000059 @ 6.94 @固定@ 16 @ 5 @ 2 @ 45 @中国试验@ YES @ Xia 2002(CPZ)@ STD-Xia-2002-_000028_CPZ_x0029_ @ 579 @ 566 @ …

r read.table read.csv

use*_*273

2015 02-21

2
推荐指数

1
解决办法

4125
查看次数