azu*_*ric 5 python numpy svm pandas scikit-learn
我一直在尝试:
这是使用填充随机值的pandas数据框的代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(20,5), columns=["A","B","C","D", "E"])
a = list(df.columns.values)
a.remove("A")
X = df[a]
y = df["A"]
X_train = X.iloc[0: floor(2 * len(X) /3)]
X_test = X.iloc[floor(2 * len(X) /3):]
y_train = y.iloc[0: floor(2 * len(y) /3)]
y_test = y.iloc[floor(2 * len(y) /3):]
# normalise
from sklearn import preprocessing
X_trainS = preprocessing.scale(X_train)
X_trainN = pd.DataFrame(X_trainS, columns=a)
X_testS = preprocessing.scale(X_test)
X_testN = pd.DataFrame(X_testS, columns=a)
y_trainS = preprocessing.scale(y_train)
y_trainN = pd.DataFrame(y_trainS)
y_testS = preprocessing.scale(y_test)
y_testN = pd.DataFrame(y_testS)
import sklearn
from sklearn.svm import SVR
clf = SVR(kernel='rbf', C=1e3, gamma=0.1)
pred = clf.fit(X_trainN,y_trainN).predict(X_testN)
Run Code Online (Sandbox Code Playgroud)
给出此错误:
C:\ Anaconda3 \ lib \ site-packages \ pandas \ core \ index.py:542:FutureWarning:使用iloc时,切片索引器应为整数,而不是浮点“而不是浮点”,FutureWarning)------ -------------------------------------------------- ------------------- ValueError Traceback(最近一次通话最后一次)()34 clf = SVR(内核='rbf',C = 1e3,gamma = 0.1)35 ---> 36 pred = clf.fit(X_trainN,y_trainN).predict(X_testN)37
C:\ Anaconda3 \ lib \ site-packages \ sklearn \ svm \ base.py in fit(自身,X,y,sample_weight)174175种子= rnd.randint(np.iinfo('i')。max)- > 176 fit(X,y,sample_weight,solver_type,kernel,random_seed = seed)177#请参见此文件中对np.iinfo的另一个调用的注释
C:\ Anaconda3 \ lib \ site-packages \ sklearn \ svm \ base.py in _dense_fit(self,X,y,sample_weight,solver_type,kernel,random_seed)229 cache_size = self.cache_size,coef0 = self.coef0,230伽玛= self._gamma,epsilon = self.epsilon,-> 231 max_iter = self.max_iter,random_seed = random_seed)232233 self._warn_from_fit_status()
sklearn.svm.libsvm.fit中的C:\ Anaconda3 \ lib \ site-packages \ sklearn \ svm \ libsvm.pyd(sklearn \ svm \ libsvm.c:1864)()
ValueError:缓冲区的维数错误(预期为1,得到2)
我不知道为什么。谁能解释?我认为这与预处理后转换回数据帧有关。
这里的错误出现在您作为标签传递的 df 中:y_trainN
如果您将示例文档版本和您的代码进行比较:
In [40]:
n_samples, n_features = 10, 5
np.random.seed(0)
y = np.random.randn(n_samples)
print(y)
y_trainN.values
[ 1.76405235 0.40015721 0.97873798 2.2408932 1.86755799 -0.97727788
0.95008842 -0.15135721 -0.10321885 0.4105985 ]
Out[40]:
array([[-0.06680594],
[ 0.23535043],
[-1.49265082],
[ 1.22537862],
[-0.46499134],
[-0.23744759],
[ 1.40520679],
[ 0.95882677],
[ 1.66996413],
[-0.37515955],
[-0.75826444],
[-1.45945337],
[-0.63995369]])
Run Code Online (Sandbox Code Playgroud)
因此,您可以调用squeeze生成一个系列或选择 df 中的唯一列,以免出现错误:
pred = clf.fit(X_trainN,y_trainN[0]).predict(X_testN)
Run Code Online (Sandbox Code Playgroud)
或者
pred = clf.fit(X_trainN,y_trainN.squeeze()).predict(X_testN)
Run Code Online (Sandbox Code Playgroud)
所以我们可以认为,对于只有一列的 df,它应该返回一些可以被强制转换为 numpy 数组的东西,或者 numpy 没有正确调用数组属性,但实际上你应该传递一个系列或从 df 中选择列作为参数
| 归档时间: |
|
| 查看次数: |
1866 次 |
| 最近记录: |