use*_*180 6 python csv numpy machine-learning matplotlib
import numpy as np
import pandas as pd
import matplotlib.pyplot as pt
data1 = pd.read_csv('stage1_labels.csv')
X = data1.iloc[:, :-1].values
y = data1.iloc[:, 1].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
label_X = LabelEncoder()
X[:,0] = label_X.fit_transform(X[:,0])
encoder = OneHotEncoder(categorical_features = [0])
X = encoder.fit_transform(X).toarray()
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X, y, test_size = 0.4, random_state = 0)
#fitting Simple Regression to training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
#predecting the test set results
y_pred = regressor.predict(X_test)
#Visualization of the training set results
pt.scatter(X_train, y_train, color = 'red')
pt.plot(X_train, regressor.predict(X_train), color = 'green')
pt.title('salary vs yearExp (Training set)')
pt.xlabel('years of experience')
pt.ylabel('salary')
pt.show()
Run Code Online (Sandbox Code Playgroud)
执行上面的代码时,我需要帮助理解错误.以下是错误:
"引发ValueError("x和y必须大小相同")"
我有.csv文件,包含1398行和2列.我将y_test设置为40%,因为它在上面的代码中可见.
请帮忙
此致,阿米特什
Luk*_*ski 15
打印X_train形状.你看到了什么?我敢打赌X_train是2d(矩阵有一列),而y_train1d(向量).反过来你会得到不同的尺寸.
我认为X_train[:,0]用于绘图(来自错误的起源)应该可以解决问题
切片[:, :-1]将为您提供一个二维数组(包括除最后一列之外的所有行和所有列)。
切片[:, 1]将为您提供一个一维数组(包括第二列中的所有行)。要使此数组也为二维,请使用[:, 1:2]or[:, 1].reshape(-1, 1)或[:, 1][:, None]代替[:, 1]。这将使x和y具有可比性。
使两个数组都为二维的另一种方法是使它们都是一维的。为此,可以[:, 0](而不是[:, :1])选择第一列和[:, 1]选择第二列。