我正在尝试恢复检查点并预测不同的句子NMT 注意力模型。在恢复检查点和预测时,我得到了带有以下警告的乱码结果:
Unresolved object in checkpoint (root).optimizer.iter: attributes {
name: "VARIABLE_VALUE"
full_name: "Adam/iter"
checkpoint_key: "optimizer/iter/.ATTRIBUTES/VARIABLE_VALUE"
}
Run Code Online (Sandbox Code Playgroud)
以下是我收到的其他警告和结果:
WARNING: Logging before flag parsing goes to stderr.
W1008 09:57:52.766877 4594230720 util.py:244] Unresolved object in checkpoint: (root).optimizer.iter
W1008 09:57:52.767037 4594230720 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_1
W1008 09:57:52.767082 4594230720 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_2
W1008 09:57:52.767120 4594230720 util.py:244] Unresolved object in checkpoint: (root).optimizer.decay
W1008 09:57:52.767155 4594230720 util.py:244] Unresolved object in checkpoint: (root).optimizer.learning_rate
W1008 09:57:52.767194 4594230720 util.py:244] Unresolved object in checkpoint: …Run Code Online (Sandbox Code Playgroud) 我尝试在600000行句子上应用doc2vec:代码如下:
from gensim import models
model = models.Doc2Vec(alpha=.025, min_alpha=.025, min_count=1, workers = 5)
model.build_vocab(res)
token_count = sum([len(sentence) for sentence in res])
token_count
%%time
for epoch in range(100):
#print ('iteration:'+str(epoch+1))
#model.train(sentences)
model.train(res, total_examples = token_count,epochs = model.iter)
model.alpha -= 0.0001 # decrease the learning rate`
model.min_alpha = model.alpha # fix the learning rate, no decay
Run Code Online (Sandbox Code Playgroud)
通过上述实现,我的结果非常糟糕.我从教程中建议的更改改变了以下行:
model.train(sentences)
Run Code Online (Sandbox Code Playgroud)
如:
token_count = sum([len(sentence) for sentence in res])
model.train(res, total_examples = token_count,epochs = model.iter)
Run Code Online (Sandbox Code Playgroud) 我尝试使用gensim生成300000条记录的主题。在尝试可视化主题时,出现验证错误。我可以在模型训练后打印主题,但是在使用pyLDAvis时失败
# Running and Training LDA model on the document term matrix.
ldamodel1 = Lda(doc_term_matrix1, num_topics=10, id2word = dictionary1, passes=50, workers = 4)
(ldamodel1.print_topics(num_topics=10, num_words = 10))
#pyLDAvis
d = gensim.corpora.Dictionary.load('dictionary1.dict')
c = gensim.corpora.MmCorpus('corpus.mm')
lda = gensim.models.LdaModel.load('topic.model')
#error on executing this line
data = pyLDAvis.gensim.prepare(lda, c, d)
Run Code Online (Sandbox Code Playgroud)
在pyLDAvis上面运行后,我尝试执行以下错误
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
<ipython-input-53-33fd88b65056> in <module>()
----> 1 data = pyLDAvis.gensim.prepare(lda, c, d)
2 data
C:\ProgramData\Anaconda3\lib\site-packages\pyLDAvis\gensim.py in prepare(topic_model, corpus, dictionary, doc_topic_dist, **kwargs)
110 """
111 opts = fp.merge(_extract_data(topic_model, …Run Code Online (Sandbox Code Playgroud) category = df.category_name_column.value_counts()
Run Code Online (Sandbox Code Playgroud)
我有以上系列返回值:
CategoryA,100
CategoryB,200
Run Code Online (Sandbox Code Playgroud)
我试图在X轴上绘制前5个类别名称,在y轴上绘制值
head = (category.head(5))
sns.barplot(x = head ,y=df.category_name_column.value_counts(), data=df)
Run Code Online (Sandbox Code Playgroud)
它不会在X轴上打印类别的"名称",而是打印计数.如何打印X中的前5个名称和Y中的值?
我使用tokenizer = RegexpTokenizer(r'\w+')它保留字母数字字符但是如何组合正则表达式来删除仅保留大于长度 2 的字符的所有其他元素
下面是数据框中的一行,其中包含随机文本
0 [ANOTHER 2'' F/P SAMPLE 01:52 ...A13232 / AS OUTPUT MSG...
我正在尝试将具有“对象”类型的邮政编码的列转换为“ int”
df['ZIP'] = df['ZIP'].astype(str).astype(int)
我的数据超过100000条记录,并且不断抛出带有该列中无效的不同文字的消息。我了解数据类型不匹配,并且转换失败。
ValueError: invalid literal for int() with base 10: ' '
Run Code Online (Sandbox Code Playgroud)
为了纠正上述错误,我用nan替换了“空行”,并使用以下代码删除了它们:
df['ZIP'] = df['ZIP'].replace('', np.nan)
df['ZIP'] = df.dropna(subset=['ZIP'])
Run Code Online (Sandbox Code Playgroud)
之后,我再次收到以下错误。
ValueError: invalid literal for int() with base 10: 'SAM'
Run Code Online (Sandbox Code Playgroud)
是否有一种无需所有这些步骤即可删除所有无效文字的有效方法?
python ×5
nlp ×3
pandas ×2
doc2vec ×1
gensim ×1
lda ×1
matplotlib ×1
python-3.x ×1
regex ×1
seaborn ×1
tensorflow ×1