小编Muh*_*akh的帖子

如何仅在gensim中访问主题词

我使用 Gensim 构建了 LDA 模型，我只想获取主题词如何仅获取主题词没有概率也没有 IDs.words

我在 gensim 中尝试了 print_topics() 和 show_topics() 函数，但我找不到干净的词！

这是我使用的代码

dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=12, id2word = dictionary, passes = 100, alpha='auto', update_every=5)
x = ldamodel.print_topics(num_topics=12, num_words=5)
for i in x:
    print(i[1])
    #print('\n' + str(i))

0.045*???? + 0.045*??????? + 0.045*??????? + 0.045*?????? + 0.045*?????
0.021*??? + 0.021*??????????? + 0.021*???? + 0.021*???? + 0.021*???????
0.068*???????? + 0.068*???????? + 0.068*????????? + 0.068*????? + 0.005*????
0.033*????? + …

Run Code Online (Sandbox Code Playgroud)

python nlp lda gensim topic-modeling

Muh*_*akh

lucky-day

8
推荐指数

2
解决办法

4744
查看次数

如何合并列表和 csr 矩阵

我有一个数字列表，len(lex) = 6064看起来像这样

[0,
 0,
 1,
 0,
 0,
 -1,
 1,
 1,
 0,
 0,
 0,
 0,
 1,
 0,]

Run Code Online (Sandbox Code Playgroud)

和企业社会责任矩阵

tweets.shape = (6064, 2500)

Run Code Online (Sandbox Code Playgroud)

如何合并它们我尝试将它们转换为两个列表，但是当我尝试处理它时出现错误

tweets = list(tweets)
lex = list(lex)
tweets_final = np.column_stack([tweets, lex])

Run Code Online (Sandbox Code Playgroud)

在我分割训练数据后，我收到以下错误

nb.fit(X_train, y_train)


ValueError: setting an array element with a sequence.

Run Code Online (Sandbox Code Playgroud)

如何将该列表添加为该矩阵的一列

python numpy matrix scipy

Muh*_*akh

lucky-day

2
推荐指数

1
解决办法

1419
查看次数

TypeError:'map'类型的对象没有len()Python3

我正在尝试使用Pyspark实现KMeans算法,它在while循环的最后一行给出了上述错误.它在循环外工作正常,但在我创建循环后它给了我这个错误我该怎么解决这个问题？

#  Find K Means of Loudacre device status locations
#
# Input data: file(s) with device status data (delimited by '|')
# including latitude (13th field) and longitude (14th field) of device locations
# (lat,lon of 0,0 indicates unknown location)
# NOTE: Copy to pyspark using %paste

# for a point p and an array of points, return the index in the array of the point closest to p
def closestPoint(p, points):
    bestIndex = 0
    closest = float("+inf")
    # …

Run Code Online (Sandbox Code Playgroud)

python k-means python-3.x apache-spark pyspark

Muh*_*akh

2017 01-28

0
推荐指数

1
解决办法

9241
查看次数

在tweepy中使用推文ID检索推文列表

我有一个包含推文ID列表的文件,我想要检索这些推文.该文件包含超过100000条推文,而twitter API仅允许检索100条.

api = tweepy.API(auth)
good_tweet_ids = [i for i in por.TweetID[0:100]]
tweets = api.statuses_lookup(good_tweet_ids)
for tweet in tweets:
    print(tweet.text)

Run Code Online (Sandbox Code Playgroud)

有没有办法检索更多的推文说1000或2000,我不想采取数据样本并将结果保存到文件并每次更改推文ID的索引,所以有办法做到这一点!？

python twitter tweepy

Muh*_*akh

lucky-day

0
推荐指数

1
解决办法

3222
查看次数