小编Jaf*_*deq的帖子

python 线性回归:密集与稀疏

我需要在稀疏矩阵上使用线性回归。我得到的结果很差,所以我决定在稀疏表示的非稀疏矩阵上测试它。数据取自https://www.analyticsvidhya.com/blog/2021/05/multiple-linear-regression-using-python-and-scikit-learn/

我已经为某些列生成了最大归一化值。CSV 文件位于: https://drive.google.com/file/d/17wHv1Cc3RKgshprIKTcWUSxZOWlG68__/view ?usp=sharing

运行正常的线性回归效果很好。示例代码:

df = pd.read_csv("maxnorm_50_Startups.csv")
y = pd.DataFrame()
y = df['Profit']
x = pd.DataFrame()
x = df.drop('Profit', axis=1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
LR = LinearRegression()
LR.fit(x_train, y_train)
y_prediction = LR.predict(x_test)
score=r2_score(y_test, y_prediction)
print('r2 score is', score)
Run Code Online (Sandbox Code Playgroud)

与样本结果:

r2 score is 0.9683831928840445
Run Code Online (Sandbox Code Playgroud)

我想用稀疏矩阵重复这一点。我将 CSV 转换为稀疏表示: https://drive.google.com/file/d/1CFWbBbtiSqTSlepGuYXsxa00MSHOj-Vx/view ?usp=sharing

这是我对其进行线性回归的代码:

df = pd.read_csv("maxnorm_50_Startups_relational.csv")
df['x'] = pd.to_numeric(df['x'], errors='raise')

m = len(df.x.unique())

for i in range(0, m): # randomize the 'x' values to randomize train …
Run Code Online (Sandbox Code Playgroud)

python sparse-matrix linear-regression scikit-learn sklearn-pandas

3
推荐指数
1
解决办法
1346
查看次数