获取No循环匹配指定的签名和转换错误

Sha*_*ake 7 python numpy machine-learning scikit-learn

我是python和机器学习的初学者.当我尝试将数据放入statsmodels.formula.api时,我得到以下错误OLS.fit()

Traceback(最近一次调用最后一次):

文件"",第47行,在regressor_OLS = sm.OLS(y,X_opt).fit()

文件 "E:\蟒蛇\ LIB \站点包\ statsmodels \回归\ linear_model.py",线190,在配合self.pinv_wexog,singular_values = pinv_extended(self.wexog)

文件"E:\ Anaconda\lib\site-packages\statsmodels\tools\tools.py",第342行,在pinv_extended u,s,vt = np.linalg.svd(X,0)

文件 "E:\蟒蛇\ LIB \站点包\numpy的\ linalg\linalg.py",线1404,在SVD U,S,VT = gufunc(一,签名=签名,extobj = extobj)

TypeError:找不到与指定签名匹配的循环,并为ufunc svd_n_s找到了强制转换

#Importing Libraries
import numpy as np # linear algebra
import pandas as pd # data processing
import matplotlib.pyplot as plt #Visualization


#Importing the dataset
dataset = pd.read_csv('Video_Games_Sales_as_at_22_Dec_2016.csv')
#dataset.head(10) 

#Encoding categorical data using panda get_dummies function . Easier and straight forward than OneHotEncoder in sklearn
#dataset = pd.get_dummies(data = dataset , columns=['Platform' , 'Genre' , 'Rating' ] , drop_first = True ) #drop_first use to fix dummy varible trap 


dataset=dataset.replace('tbd',np.nan)

#Separating Independent & Dependant Varibles
#X = pd.concat([dataset.iloc[:,[11,13]], dataset.iloc[:,13: ]] , axis=1).values  #Getting important  variables
X = dataset.iloc[:,[10,12]].values
y = dataset.iloc[:,9].values #Dependant Varible (Global sales)


#Taking care of missing data
from sklearn.preprocessing import Imputer
imputer =  Imputer(missing_values = 'NaN' , strategy = 'mean' , axis = 0)
imputer = imputer.fit(X[:,0:2])
X[:,0:2] = imputer.transform(X[:,0:2])


#Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.2 , random_state = 0)

#Fitting Mutiple Linear Regression to the Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train)

#Predicting the Test set Result
y_pred = regressor.predict(X_test)


#Building the optimal model using Backward Elimination (p=0.050)
import statsmodels.formula.api as sm
X = np.append(arr = np.ones((16719,1)).astype(float) , values = X , axis = 1)

X_opt = X[:, [0,1,2]]
regressor_OLS = sm.OLS(y , X_opt).fit()
regressor_OLS.summary() 
Run Code Online (Sandbox Code Playgroud)

数据集

数据集链接

在堆栈溢出或谷歌上找不到任何有用的解决方法.

小智 11

尝试指定

dtype ='浮动'

创建矩阵时.例:

a=np.matrix([[1,2],[3,4]], dtype='float')
Run Code Online (Sandbox Code Playgroud)

希望这个有效!

  • 我喜欢能够快速访问 Stackoverflow,并且大多数时候都能快速解决困扰我很长时间的问题…… (3认同)

mah*_*eju 6

面临类似的问题。解决了我提到的 dtype 并展平数组的问题。

numpy 版本:1.17.3

a = np.array(a, dtype=np.float)
a = a.flatten()
Run Code Online (Sandbox Code Playgroud)