Muh*_*akh 2 python numpy matrix scipy
我有一个数字列表,len(lex) = 6064看起来像这样
[0,
0,
1,
0,
0,
-1,
1,
1,
0,
0,
0,
0,
1,
0,]
Run Code Online (Sandbox Code Playgroud)
和企业社会责任矩阵
tweets.shape = (6064, 2500)
Run Code Online (Sandbox Code Playgroud)
如何合并它们我尝试将它们转换为两个列表,但是当我尝试处理它时出现错误
tweets = list(tweets)
lex = list(lex)
tweets_final = np.column_stack([tweets, lex])
Run Code Online (Sandbox Code Playgroud)
在我分割训练数据后,我收到以下错误
nb.fit(X_train, y_train)
ValueError: setting an array element with a sequence.
Run Code Online (Sandbox Code Playgroud)
如何将该列表添加为该矩阵的一列
您可以使用scipy.sparse.hstack水平堆叠这两个(按列)。我们只需要将列表转换为列向量(就稀疏矩阵而言)或具有单列的二维数组 -
scipy.sparse.hstack(( tweets, csr_matrix(lex).T ))
scipy.sparse.hstack(( tweets, np.asarray(lex)[:,None] ))
Run Code Online (Sandbox Code Playgroud)
样本运行 -
In [189]: from scipy.sparse import csr_matrix
In [194]: import scipy as sp
In [190]: a = np.random.randint(0,4,(5,10))
In [192]: a
Out[192]:
array([[2, 1, 1, 1, 0, 3, 1, 3, 2, 1],
[0, 2, 1, 2, 3, 0, 1, 1, 2, 3],
[0, 1, 1, 1, 2, 3, 0, 1, 0, 1],
[0, 0, 3, 0, 3, 0, 1, 0, 3, 1],
[1, 0, 2, 3, 3, 3, 2, 2, 0, 1]])
In [193]: b = [9,8,7,6,5] # equivalent to lex
In [191]: A = csr_matrix(a) # equivalent to tweets
In [195]: sp.sparse.hstack(( A, csr_matrix(b).T ))
Out[195]:
<5x11 sparse matrix of type '<type 'numpy.int64'>'
with 42 stored elements in COOrdinate format>
In [197]: _.toarray() # verify values by converting to dense array
Out[197]:
array([[2, 1, 1, 1, 0, 3, 1, 3, 2, 1, 9],
[0, 2, 1, 2, 3, 0, 1, 1, 2, 3, 8],
[0, 1, 1, 1, 2, 3, 0, 1, 0, 1, 7],
[0, 0, 3, 0, 3, 0, 1, 0, 3, 1, 6],
[1, 0, 2, 3, 3, 3, 2, 2, 0, 1, 5]])
Run Code Online (Sandbox Code Playgroud)