我有一个文本变量和一个分组变量.我想将text变量按行数折叠成每行一个字符串(合并).因此,只要组列说m我想将文本组合在一起等等.我提供了之前和之后的样本数据集.我正在写这个包,并且迄今为止避免了所有依赖于其他包除了wordcloud并且希望保持这种方式.
我怀疑rle可能有用,cumsum但无法弄清楚这一点.
先感谢您.
数据是什么样的
text group
1 Computer is fun. Not too fun. m
2 No its not, its dumb. m
3 How can we be certain? f
4 There is no way. m
5 I distrust you. m
6 What are you talking about? f
7 Shall we move on? Good then. f
8 Im hungry. Lets eat. You already? m
Run Code Online (Sandbox Code Playgroud)
我希望数据看起来像什么
text group
1 Computer is fun. Not too fun. No its not, its dumb. m
2 How can we be certain? f
3 There is no way. I distrust you. m
4 What are you talking about? Shall we move on? Good then. f
5 Im hungry. Lets eat. You already? m
Run Code Online (Sandbox Code Playgroud)
数据
dat <- structure(list(text = c("Computer is fun. Not too fun.", "No its not, its dumb.",
"How can we be certain?", "There is no way.", "I distrust you.",
"What are you talking about?", "Shall we move on? Good then.",
"Im hungry. Lets eat. You already?"), group = structure(c(2L,
2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("text",
"group"), row.names = c(NA, 8L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
编辑:我发现我可以为每个组变量运行添加唯一列:
x <- rle(as.character(dat$group))[[1]]
dat$new <- as.factor(rep(1:length(x), x))
Run Code Online (Sandbox Code Playgroud)
产量:
text group new
1 Computer is fun. Not too fun. m 1
2 No its not, its dumb. m 1
3 How can we be certain? f 2
4 There is no way. m 3
5 I distrust you. m 3
6 What are you talking about? f 4
7 Shall we move on? Good then. f 4
8 Im hungry. Lets eat. You already? m 5
Run Code Online (Sandbox Code Playgroud)
这使用rle创建一个id来对句子进行分组.它使用tapply和paste一起将输出结合在一起
## Your example data
dat <- structure(list(text = c("Computer is fun. Not too fun.", "No its not, its dumb.",
"How can we be certain?", "There is no way.", "I distrust you.",
"What are you talking about?", "Shall we move on? Good then.",
"Im hungry. Lets eat. You already?"), group = structure(c(2L,
2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("text",
"group"), row.names = c(NA, 8L), class = "data.frame")
# Needed for later
k <- rle(as.numeric(dat$group))
# Create a grouping vector
id <- rep(seq_along(k$len), k$len)
# Combine the text in the desired manner
out <- tapply(dat$text, id, paste, collapse = " ")
# Bring it together into a data frame
answer <- data.frame(text = out, group = levels(dat$group)[k$val])
Run Code Online (Sandbox Code Playgroud)