Ami*_*nST 4 nlp persian keras word-embedding
我有此代码适用于英语但不适用于波斯语
from gensim.models import Word2Vec as wv
for sentence in sentences:
tokens = sentence.strip().lower().split(" ")
tokenized.append(tokens)
model = wv(tokenized
,size=5,
min_count=1)
print('done2')
model.save('F:/text8/text8-phrases1')
print('done3')
print(model)
model = wv.load('F:/text8/text8-phrases1')
print(model.wv.vocab)
Run Code Online (Sandbox Code Playgroud)
输出
> '??': <gensim.models.keyedvectors.Vocab object at 0x0000027716EEB0B8>,
> '????': <gensim.models.keyedvectors.Vocab object at
> 0x0000027716EEB160>, '??????': <gensim.models.keyedvectors.Vocab
> object at 0x0000027716EEB198>, '???????':
> <gensim.models.keyedvectors.Vocab object at 0x0000027716EEB1D0>,
> '???????': <gensim.models.keyedvectors.Vocab object at
> 0x0000027716EEB208>, '???????': <gensim.models.keyedvectors.Vocab
> object at 0x0000027716EEB240>, '?????':
> <gensim.models.keyedvectors.Vocab object at 0x0000027716EEB278>,
> '?????': <gensim.models.keyedvectors.Vocab object at
> 0x0000027716EEB2B0>, '????'
Run Code Online (Sandbox Code Playgroud)
请以代码为例谢谢
@AminST, I know it's too late to answer your question, but there might be some people with the same problem. So I put some useful code here. I used the code below on digikala comments. I only assume that you had your preprocessing section (Removing stopwords, HTML, emojis and ...) and data is ready for vectorizing.
from hazm import word_tokenize
import pandas as pd
import gensim
from gensim.models.word2vec import Word2Vec
# reading dataset
df = pd.read_csv('data/cleaned/data.csv')
df.title = df.title.apply(str)
df.comment = df.comment.apply(str)
# Storing comments in list
comments = [comment for comment in df.comment]
# converting each sentence to list of words and inserting in sents
sents = [word_tokenize(comment) for comment in comments]
model = Word2Vec(sentences=sents, size=64, window=10, min_count=5, seed=42, workers=5)
model.save('digikala_words.w2v')
# Check for vector
model['????????']
Run Code Online (Sandbox Code Playgroud)
我真的希望它可以帮助你,我的朋友。如果您仍然有兴趣查看更多详细信息,请访问此链接: digikala 评论验证
| 归档时间: |
|
| 查看次数: |
1194 次 |
| 最近记录: |