在Ubuntu下全新安装Anaconda ...我在使用Scikit-Learn进行分类任务之前以各种方式预处理我的数据.
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler().fit(train)
train = scaler.transform(train)
test = scaler.transform(test)
Run Code Online (Sandbox Code Playgroud)
这一切都很好,但如果我有一个新的样本(温度低于),我想分类(因此我想以相同的方式预处理然后我得到
temp = [1,2,3,4,5,5,6,....................,7]
temp = scaler.transform(temp)
Run Code Online (Sandbox Code Playgroud)
然后我收到了弃用警告......
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17
and will raise ValueError in 0.19. Reshape your data either using
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.
Run Code Online (Sandbox Code Playgroud)
所以问题是我应该如何重新缩放这样的单个样本?
我想一个替代方案(不是很好的)会......
temp = [temp, temp]
temp = scaler.transform(temp)
temp = temp[0]
Run Code Online (Sandbox Code Playgroud)
但我确信有更好的方法.
我正在训练一个用于文本分类的python(2.7.11)分类器,并且在运行时我收到一条弃用的警告消息,我不知道我的代码中的哪一行导致它!错误/警告.但是,代码工作正常并给我结果......
\ AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.py:386:DeprecationWarning:传递1d数组,因为数据在0.17中已弃用,并且会在0.19中提升ValueError.如果数据具有单个要素,则使用X.reshape(-1,1)重新整形数据;如果包含单个样本,则使用X.reshape(1,-1)重新整形数据.
我的代码:
def main():
data = []
folds = 10
ex = [ [] for x in range(0,10)]
results = []
for i,f in enumerate(sys.argv[1:]):
data.append(csv.DictReader(open(f,'r'),delimiter='\t'))
for f in data:
for i,datum in enumerate(f):
ex[i % folds].append(datum)
#print ex
for held_out in range(0,folds):
l = []
cor = []
l_test = []
cor_test = []
vec = []
vec_test = []
for i,fold in enumerate(ex):
for line in fold:
if i == held_out:
l_test.append(line['label'].rstrip("\n"))
cor_test.append(line['text'].rstrip("\n"))
else:
l.append(line['label'].rstrip("\n"))
cor.append(line['text'].rstrip("\n")) …Run Code Online (Sandbox Code Playgroud)