小编Rob*_*ert的帖子

LDA主题建模输入数据

我是python的新手.我刚刚开始研究在推文上使用LDA主题建模的项目.我正在尝试以下代码:

此示例使用在线数据集.我有一个csv文件,其中包含我需要使用的推文.任何人都可以告诉我如何使用我的本地文件?我该如何制作自己的词汇和标题?

我找不到解释如何为LDA准备材料的教程.他们都假设你已经知道如何这样做.

from __future__ import division, print_function

import numpy as np
import lda
import lda.datasets


# document-term matrix

X = lda.datasets.load_reuters()
print("type(X): {}".format(type(X)))
print("shape: {}\n".format(X.shape))

# the vocab
vocab = lda.datasets.load_reuters_vocab()
print("type(vocab): {}".format(type(vocab)))
print("len(vocab): {}\n".format(len(vocab)))

# titles for each story
titles = lda.datasets.load_reuters_titles()
print("type(titles): {}".format(type(titles)))
print("len(titles): {}\n".format(len(titles)))


doc_id = 0
word_id = 3117

print("doc id: {} word id: {}".format(doc_id, word_id))
print("-- count: {}".format(X[doc_id, word_id]))
print("-- word : {}".format(vocab[word_id]))
print("-- doc  : {}".format(titles[doc_id]))


model = lda.LDA(n_topics=20, n_iter=500, random_state=1)
model.fit(X)


topic_word …
Run Code Online (Sandbox Code Playgroud)

python twitter lda topic-modeling

4
推荐指数
1
解决办法
3681
查看次数

元组没有属性“ isdigit”

我需要使用NLTK模块进行一些文字处理,然后出现以下错误:AttributeError:'tuple'对象没有属性'isdigit'

有人知道如何处理此错误吗?

Traceback (most recent call last):
  File "preprocessing-edit.py", line 36, in <module>
    postoks = nltk.tag.pos_tag(tok)
NameError: name 'tok' is not defined

PS C:\Users\moham\Desktop\Presentation> python preprocessing-edit.py
Traceback (most recent call last):
  File "preprocessing-edit.py", line 37, in <module>
    postoks = nltk.tag.pos_tag(tok)
  File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\__init__.py", line 111, in pos_tag
    return _pos_tag(tokens, tagset, tagger)
  File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\__init__.py", line 82, in _pos_tag
    tagged_tokens = tagger.tag(tokens)
  File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\perceptron.py", line 153, in tag
    context = self.START + [self.normalize(w) for w in tokens] + self.END
  File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\perceptron.py", line 153, …
Run Code Online (Sandbox Code Playgroud)

python tokenize nltk

1
推荐指数
1
解决办法
2332
查看次数

标签 统计

python ×2

lda ×1

nltk ×1

tokenize ×1

topic-modeling ×1

twitter ×1