我有带有评论栏的调查数据。我正在寻找对回复的情绪分析。问题是数据中有很多语言,我不知道如何从集合中消除多种语言停用词
'nps' 是我的数据源,nps$customer.feedback 是评论栏。
首先我对数据进行标记
#TOKENISE
comments <- nps %>%
filter(!is.na(cusotmer.feedback)) %>%
select(cat, Comment) %>%
group_by(row_number(), cat)
comments <- comments %>% ungroup()
Run Code Online (Sandbox Code Playgroud)
摆脱停用词
nps_words <- nps_words %>% anti_join(stop_words, by = c('word'))
Run Code Online (Sandbox Code Playgroud)
然后使用 Stemming 和 get_sentimets("bing") 按情绪显示字数。
#stemgraph
nps_words %>%
mutate(word = wordStem(word)) %>%
inner_join(get_sentiments("bing") %>% mutate(word = wordStem(word)), by =
c('word')) %>%
count(cat, word, sentiment) %>%
group_by(cat, sentiment) %>%
top_n(7) %>%
ungroup() %>%
ggplot(aes(x=reorder(word, n), y = n, fill = sentiment)) +
geom_col() +
coord_flip() +
facet_wrap( ~cat, scales = …Run Code Online (Sandbox Code Playgroud)