在dplyr中重复data.frame行

Question

在dplyr中重复data.frame行

我在使用重复的实际数据行时遇到了麻烦dplyr.这里已经有另一篇文章重复数据帧的行,但没有解决方案dplyr.

在这里,我只是想知道怎么可能是解决方案,dplyr 但失败了,错误:

错误:结果大小错误(16),预期为4或1

library(dplyr)
    df <- data.frame(column = letters[1:4])

    df_rep <- df%>%
      mutate(column=rep(column,each=4))

Run Code Online (Sandbox Code Playgroud)

预期产出

>df_rep 
    column
    #a
    #a
    #a
    #a
    #b
    #b
    #b
    #b
    #*
    #*
    #*

Run Code Online (Sandbox Code Playgroud)

Answer 1

Bra*_*ell 8

我一直在寻找类似(但略有不同)的解决方案.发布在这里,以防它对其他人有用.

就我而言,我需要一个更通用的解决方案,允许每个字母重复任意次数.这是我想出的:

library(tidyverse)

df <- data.frame(letters = letters[1:4])
df

> df
  letters
1       a
2       b
3       c
4       d

Run Code Online (Sandbox Code Playgroud)

假设我想要2个A,3个B,2个C和4个D:

df %>% 
  mutate(count = c(2, 3, 2, 4)) %>% 
  group_by(letters) %>% 
  expand(count = seq(1:count))

# A tibble: 11 x 2
# Groups:   letters [4]
   letters count
    <fctr> <int>
 1       a     1
 2       a     2
 3       b     1
 4       b     2
 5       b     3
 6       c     1
 7       c     2
 8       d     1
 9       d     2
10       d     3
11       d     4

Run Code Online (Sandbox Code Playgroud)

如果您不想保留计数列:

df %>% 
  mutate(count = c(2, 3, 2, 4)) %>% 
  group_by(letters) %>% 
  expand(count = seq(1:count)) %>% 
  select(letters)

# A tibble: 11 x 1
# Groups:   letters [4]
   letters
    <fctr>
 1       a
 2       a
 3       b
 4       b
 5       b
 6       c
 7       c
 8       d
 9       d
10       d
11       d

Run Code Online (Sandbox Code Playgroud)

如果您希望计数反映每个字母重复的次数:

df %>% 
  mutate(count = c(2, 3, 2, 4)) %>% 
  group_by(letters) %>% 
  expand(count = seq(1:count)) %>% 
  mutate(count = max(count))

# A tibble: 11 x 2
# Groups:   letters [4]
   letters count
    <fctr> <dbl>
 1       a     2
 2       a     2
 3       b     3
 4       b     3
 5       b     3
 6       c     2
 7       c     2
 8       d     4
 9       d     4
10       d     4
11       d     4

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 7

使用该uncount功能也可以解决此问题。该列count指示应多久重复一行。

library(tidyverse)

df <- tibble(letters = letters[1:4])

df 
# A tibble: 4 x 1
  letters
  <chr>  
1 a      
2 b      
3 c      
4 d 

df %>%
  mutate(count = c(2, 3, 2, 4)) %>%
  uncount(count)

# A tibble: 11 x 1
   letters
   <chr> 
 1 a      
 2 a      
 3 b      
 4 b      
 5 b      
 6 c      
 7 c      
 8 d      
 9 d      
10 d      
11 d

Run Code Online (Sandbox Code Playgroud)

Answer 3

r2e*_*ans 5

如果data.frame还有其他列（我在上面说过！），这很危险，但是该do块将允许您在dplyr管道内生成派生的data.frame （尽管，ceci n'est pas un pipe）：

library(dplyr)
df <- data.frame(column = letters[1:4], stringsAsFactors = FALSE)
df %>%
  do( data.frame(column = rep(.$column, each = 4), stringsAsFactors = FALSE) )
#    column
# 1       a
# 2       a
# 3       a
# 4       a
# 5       b
# 6       b
# 7       b
# 8       b
# 9       c
# 10      c
# 11      c
# 12      c
# 13      d
# 14      d
# 15      d
# 16      d

Run Code Online (Sandbox Code Playgroud)

以我的经验，`do`非常慢。您可以像df％>％slice（rep（1：n（），each = 4））那样进行切片。这也可以处理更多列的情况。 (32认同)
不错的选择，它当然更优雅。我试图提出类似的想法，但我的大脑一直在反叛。谢谢，弗兰克！（我同意，`do`会使事情陷入困境，这是一个已知的瓶颈。） (2认同)

归档时间：	9 年，4 月前
查看次数：	9576 次
最近记录：	6 年，1 月前