我正在开发一个关于sklearn的多元回归分析,我仔细查看了文档.当我运行该predict()函数时,我得到错误: predict()取2个位置参数,但给出3个
X是数据帧,y是列; 我试图将数据帧转换为数组/矩阵但仍然得到错误.
添加了一个显示x和y数组的片段.
reg.coef_
reg.predict(x,y)
x_train=train.drop('y-variable',axis =1)
y_train=train['y-variable']
x_test=test.drop('y-variable',axis =1)
y_test=test['y-variable']
x=x_test.as_matrix()
y=y_test.as_matrix()
reg = linear_model.LinearRegression()
reg.fit(x_train,y_train)
reg.predict(x,y)
Run Code Online (Sandbox Code Playgroud) 我正在尝试为分类变量创建虚拟变量。但是,当我创建它们时,我收到“ValueError:列重叠但未指定后缀”。这是代码:
dummy2 = pd.get_dummies(data['Teaching'], prefix='Teach')
dummy2.head ()
dummy2.columns = ['Small/Rural','Teaching']
data = data.join(dummy2)
##################
dummy3 = pd.get_dummies(data['Gender'], prefix='Gender_')
dummy3.head()
dummy3.columns = ['Male','Female']
data = data.join(dummy3)
#####################
dummy4 = pd.get_dummies(data['PositionTitle'], prefix='pos_')
dummy4.head()
dummy4.columns = ['Acting Director','RegioReresentative']
data = data.join(dummy4)
#####################
dummy5 = pd.get_dummies(data['Compensation'], prefix='COMP')
dummy5.head()
dummy5.columns = ['23987','46978','89473','248904']
data = data.join(dummy5)
#################3
dummy6 = pd.get_dummies(data['TypeControl'], prefix='Type')
dummy6.head()
dummy6.columns = ['City/country','District','Investor','Non Profit']
data = data.join(dummy6)
Run Code Online (Sandbox Code Playgroud)