mel*_*lik 1 python turkish nlp list
我正在尝试在 Python 中使用一个名为 Snowballstemmer 的库,但它似乎没有按预期工作。原因可能是什么?请参阅下面我的代码。
\n\n我的数据集:
\n\ndf=[[\'musteri\', \'hizmetlerine\', \'cabuk\', \'baglaniyorum\'],[\'konuda\', \'yard\xc4\xb1mc\xc4\xb1\', \'oluyorlar\', \n \'islemlerimde\']]\n
Run Code Online (Sandbox Code Playgroud)\n\n我已经应用了snowballstemmer包并导入TurkishStemmer
\n\n from snowballstemmer import TurkishStemmer\n turkStem=TurkishStemmer()\n data_words_nostops=[turkStem.stemWord(word) for word in df]\n data_words_nostops\n\n [[\'musteri\', \'hizmetlerine\', \'cabuk\', \'baglaniyorum\'],\n [\'konuda\', \'yard\xc4\xb1mc\xc4\xb1\', \'oluyorlar\', \'islemlerimde\']]\n
Run Code Online (Sandbox Code Playgroud)\n\n不幸的是它没有起作用。但是当我将它应用于单个单词时,它按预期工作:
\n\n turkStem.stemWord("islemlerimde")\n \'islem\'\n
Run Code Online (Sandbox Code Playgroud)\n\n可能是什么问题呢?任何帮助将不胜感激。
\n\n谢谢。
\n您的意思是拥有一个字符串列表而不是包含字符串的列表列表吗?
\n\n当我以这种方式重新格式化代码时,我能够获得每个单词的词干:
\n\nfrom snowballstemmer import TurkishStemmer\n\ndf = [\n \'musteri\',\n \'hizmetlerine\',\n \'cabuk\',\n \'baglaniyorum\',\n \'konuda\',\n \'yard\xc4\xb1mc\xc4\xb1\',\n \'oluyorlar\',\n \'islemlerimde\'\n]\nturkStem = TurkishStemmer()\ndata_words_nostops = [turkStem.stemWord(word) for word in df]\nprint(data_words_nostops)\n
Run Code Online (Sandbox Code Playgroud)\n\n如果您有一个字符串列表列表(假设它是您定义的df
)并且您想将其展平为单个单词列表,您可以执行以下操作:
df = [\n [\'musteri\', \'hizmetlerine\', \'cabuk\', \'baglaniyorum\'],\n [\'konuda\', \'yard\xc4\xb1mc\xc4\xb1\', \'oluyorlar\', \'islemlerimde\']\n]\nflattened_df = [item for sublist in df for item in sublist]\n\n# Output:\n# [\'musteri\', \'hizmetlerine\', \'cabuk\', \'baglaniyorum\', \'konuda\', \'yard\xc4\xb1mc\xc4\xb1\', \'oluyorlar\', \'islemlerimde\']\n
Run Code Online (Sandbox Code Playgroud)\n\n上述内容归功于这篇StackOverflow 帖子。
\n\n或者,您可以纠正循环来解决原始布局的问题:
\n\ndf = [\n [\'musteri\', \'hizmetlerine\', \'cabuk\', \'baglaniyorum\'],\n [\'konuda\', \'yard\xc4\xb1mc\xc4\xb1\', \'oluyorlar\', \'islemlerimde\']\n]\nturkStem = TurkishStemmer()\nall_stem_lists = []\n\nfor word_group in df:\n output_stems = []\n for word in word_group:\n stem = turkStem.stemWord(word)\n output_stems.append(stem)\n all_stem_lists.append(output_stems)\n\nprint(all_stem_lists)\n
Run Code Online (Sandbox Code Playgroud)\n
归档时间: |
|
查看次数: |
2426 次 |
最近记录: |