小编Sea*_*n M的帖子

删除 R 中的德语停用词

我有带有评论栏的调查数据。我正在寻找对回复的情绪分析。问题是数据中有很多语言，我不知道如何从集合中消除多种语言停用词

'nps' 是我的数据源，nps$customer.feedback 是评论栏。

首先我对数据进行标记

#TOKENISE
comments <- nps %>% 
  filter(!is.na(cusotmer.feedback)) %>% 
  select(cat, Comment) %>% 
  group_by(row_number(), cat) 

  comments <- comments %>% ungroup()

Run Code Online (Sandbox Code Playgroud)

摆脱停用词

nps_words <-  nps_words %>% anti_join(stop_words, by = c('word'))

Run Code Online (Sandbox Code Playgroud)

然后使用 Stemming 和 get_sentimets("bing") 按情绪显示字数。

 #stemgraph
  nps_words %>% 
  mutate(word = wordStem(word)) %>% 
  inner_join(get_sentiments("bing") %>% mutate(word = wordStem(word)), by = 
  c('word')) %>%
  count(cat, word, sentiment) %>%
  group_by(cat, sentiment) %>%
  top_n(7) %>%
  ungroup() %>%
  ggplot(aes(x=reorder(word, n), y = n, fill = sentiment)) +
  geom_col() +
  coord_flip() +
  facet_wrap( ~cat, scales = …

Run Code Online (Sandbox Code Playgroud)

text r text-analysis text-mining

Sea*_*n M

2018 08-21

5
推荐指数

1
解决办法

4479
查看次数