小编Sch*_*ama的帖子

带有Sklearn的Python LSA

我目前正在尝试使用Sklearn实现LSA以在多个文档中查找同义词.这是我的代码:

#import the essential tools for lsa
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics.pairwise import cosine_similarity
#other imports
from os import listdir

#load data
datafolder = 'data/'
filenames = []
for file in listdir(datafolder):
    if file.endswith(".txt"):
        filenames.append(datafolder+file)

#Document-Term Matrix
cv = CountVectorizer(input='filename',strip_accents='ascii')
dtMatrix = cv.fit_transform(filenames).toarray()
print dtMatrix.shape
featurenames = cv.get_feature_names()
print featurenames

#Tf-idf Transformation
tfidf = TfidfTransformer()
tfidfMatrix = tfidf.fit_transform(dtMatrix).toarray()
print tfidfMatrix.shape

#SVD
#n_components is recommended to be 100 by Sklearn Documentation for …
Run Code Online (Sandbox Code Playgroud)

python lsa scikit-learn

8
推荐指数
1
解决办法
1万
查看次数

标签 统计

lsa ×1

python ×1

scikit-learn ×1