Sam*_*iri 5 aggregate r pattern-matching
我有数百行的 R 数据框
word Freq
seed 4
seeds 3
contract 2
contracting 2
river 1
Run Code Online (Sandbox Code Playgroud)
我想按模式对数据进行分组,比如种子 + 种子......看起来像
word Freq
seed 7
contract 4
river 1
Run Code Online (Sandbox Code Playgroud)
一种选择是通过根据“word”中的最小字符数提取子字符串来创建分组变量“gr”,使用“word” sp 再次执行此操作,我们可以获得每组单词的子字符串,然后sum通过“word”获取“Freq”的值。
library(dplyr)
df1 %>%
group_by(gr= substr(word, 1, min(nchar(word)))) %>%
group_by(word= substr(word, 1, min(nchar(word)))) %>%
summarise(Freq= sum(Freq))
word Freq
# (chr) (int)
#1 contract 4
#2 river 1
#3 seed 7
Run Code Online (Sandbox Code Playgroud)