我是python的新手.我刚刚开始研究在推文上使用LDA主题建模的项目.我正在尝试以下代码:
此示例使用在线数据集.我有一个csv文件,其中包含我需要使用的推文.任何人都可以告诉我如何使用我的本地文件?我该如何制作自己的词汇和标题?
我找不到解释如何为LDA准备材料的教程.他们都假设你已经知道如何这样做.
from __future__ import division, print_function
import numpy as np
import lda
import lda.datasets
# document-term matrix
X = lda.datasets.load_reuters()
print("type(X): {}".format(type(X)))
print("shape: {}\n".format(X.shape))
# the vocab
vocab = lda.datasets.load_reuters_vocab()
print("type(vocab): {}".format(type(vocab)))
print("len(vocab): {}\n".format(len(vocab)))
# titles for each story
titles = lda.datasets.load_reuters_titles()
print("type(titles): {}".format(type(titles)))
print("len(titles): {}\n".format(len(titles)))
doc_id = 0
word_id = 3117
print("doc id: {} word id: {}".format(doc_id, word_id))
print("-- count: {}".format(X[doc_id, word_id]))
print("-- word : {}".format(vocab[word_id]))
print("-- doc : {}".format(titles[doc_id]))
model = lda.LDA(n_topics=20, n_iter=500, random_state=1)
model.fit(X)
topic_word …Run Code Online (Sandbox Code Playgroud)我需要使用NLTK模块进行一些文字处理,然后出现以下错误:AttributeError:'tuple'对象没有属性'isdigit'
有人知道如何处理此错误吗?
Traceback (most recent call last):
File "preprocessing-edit.py", line 36, in <module>
postoks = nltk.tag.pos_tag(tok)
NameError: name 'tok' is not defined
PS C:\Users\moham\Desktop\Presentation> python preprocessing-edit.py
Traceback (most recent call last):
File "preprocessing-edit.py", line 37, in <module>
postoks = nltk.tag.pos_tag(tok)
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\__init__.py", line 111, in pos_tag
return _pos_tag(tokens, tagset, tagger)
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\__init__.py", line 82, in _pos_tag
tagged_tokens = tagger.tag(tokens)
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\perceptron.py", line 153, in tag
context = self.START + [self.normalize(w) for w in tokens] + self.END
File "c:\python34\lib\site-packages\nltk-3.1-py3.4.egg\nltk\tag\perceptron.py", line 153, …Run Code Online (Sandbox Code Playgroud)