Ali*_*ice 5 python scipy sparse-matrix
我有这种格式的csv文件:
userId movieId rating timestamp
1 31 2.5 1260759144
2 10 4 835355493
3 1197 5 1298932770
4 10 4 949810645
Run Code Online (Sandbox Code Playgroud)
我想构造一个稀疏矩阵,其行为userId,列为movieID.我已将所有数据存储为名为"column"的字典,其中column ['user']包含用户ID,column ['movie']包含电影ID,列['rating']的评级如下:
f = open('ratings.csv','rb')
reader = csv.reader(f)
headers = ['user','movie','rating','timestamp']
column = {}
for h in headers:
column[h] = []
for row in reader:
for h, v in zip(headers, row):
column[h].append(float(v))
Run Code Online (Sandbox Code Playgroud)
当我将稀疏矩阵函数称为:
mat = scipy.sparse.csr_matrix((column['rating'],(column['user'],column['movie'])))
Run Code Online (Sandbox Code Playgroud)
我得到"TypeError:无效的形状"
请帮忙
scipy.sparse.csr_matrix([column['rating'],column['user'],column['movie']])
Run Code Online (Sandbox Code Playgroud)
您有一个由 1xn 维列表和 2xn 维列表组成的元组,这是行不通的。
PS:为了读取数据,您应该尝试 Pandas :-) ( http://pandas.pydata.org/pandas-docs/stable/ generated/pandas.read_csv.html )。最小的例子:
import pandas as pd
# Setup a dataframe from the CSV and make it sparse
df = pd.read_csv('ratings.csv')
df = df.to_sparse(fill_value=0)
print(df.head())
Run Code Online (Sandbox Code Playgroud)