每次根据前一列对匹配模式的所有列进行变异

ztl*_*ztl 8 r dplyr

mutate_at每次使用前一列时,如何改变包含模式(我猜)的所有列dplyr?--> 例如,这里的所有列foo在其名称中都应该使用之前的列进行变异(即,a对于 column fooabforfoob等等)。


set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10))
df
#            a      fooa        b      foob         c     fooc
# 1  0.5543269 0.6611216 48.26791 3.0999527  98.06053 6.035485
# 2 -0.2802719 0.8783709 51.15647 0.1586242 113.96432 2.299504
# 3  1.7751634 0.8905590 52.34582 2.3070636 101.00663 9.668332
# 4  0.1873201 0.5662805 50.58978 1.6501046  98.85561 6.045547
# 5  1.1425261 0.5935473 50.35224 3.1676038 107.02225 6.396047

library(dplyr)
df %>% mutate(fooa = fooa/100 * a,
              foob = foob/100 * b,
              fooc = fooc/100 * c)
#            a         fooa        b       foob         c     fooc
# 1  0.5543269  0.003664775 48.26791 1.49628246  98.06053 5.918428
# 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
# 3  1.7751634  0.015808878 52.34582 1.20765132 101.00663 9.765657
# 4  0.1873201  0.001060757 50.58978 0.83478430  98.85561 5.976363
# 5  1.1425261  0.006781434 50.35224 1.59495949 107.02225 6.845194

# Equivalently, in base R:
for (i in c(2, 4, 6)) {
  df[,i] = df[,i]/100 * df[, i-1]
}

Run Code Online (Sandbox Code Playgroud)

所以我正在寻找这样的东西,我猜:

# What should <PREVIOUS_COLUMN> be?
df %>% mutate_at(vars(contains('foo')), funs(./100 * <PREVIOUS_COLUMN>)) 

# OR, even better (more generic but in my case it will always be the previous column):
df %>% mutate_at(vars(contains('foo')), funs(./100 * <COLUMN_NAME_WITH_'foo'_PATTERN_REMOVED>)) 
Run Code Online (Sandbox Code Playgroud)

编辑:我应该提到原始data.frame可能包含更多的列,可能具有其他模式而不是X then fooX,以便理想的解决方案应该能够正确本地化它们(但我会保留它,因为所有答案都提供了很好的解决方案和功能) .

一个更好的例子是:

set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                bla = 5,
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10),
                blo = 8)
df
#            a      fooa        b      foob bla         c     fooc blo
# 1  0.5543269 0.6611216 48.26791 3.0999527   5  98.06053 6.035485   8
# 2 -0.2802719 0.8783709 51.15647 0.1586242   5 113.96432 2.299504   8
# 3  1.7751634 0.8905590 52.34582 2.3070636   5 101.00663 9.668332   8
# 4  0.1873201 0.5662805 50.58978 1.6501046   5  98.85561 6.045547   8
# 5  1.1425261 0.5935473 50.35224 3.1676038   5 107.02225 6.396047   8
Run Code Online (Sandbox Code Playgroud)

tmf*_*mnk 5

一种选择可能是:

df %>%
 mutate(across(starts_with("foo"))/100 * across(!matches("foo")))

           a         fooa        b       foob         c     fooc
1  0.5543269  0.003664775 48.26791 1.49628246  98.06053 5.918428
2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
3  1.7751634  0.015808878 52.34582 1.20765132 101.00663 9.765657
4  0.1873201  0.001060757 50.58978 0.83478430  98.85561 5.976363
5  1.1425261  0.006781434 50.35224 1.59495949 107.02225 6.845194
Run Code Online (Sandbox Code Playgroud)


Tim*_*Fan 3

across()这是使用和的另一种方法cur_column()。我个人不建议根据列的位置进行计算,而是建议使用列名称,因为这看起来更安全。

在下面的示例中,我们循环遍历列abc使用和访问每个相应列across的值。fooget()cur_column

set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
                fooa = runif(dfrows),
                b = rnorm(dfrows, mean=50, sd=5),
                foob = runif(dfrows, min=0, max=5),
                c = rnorm(dfrows, mean=100, sd=10),
                fooc = runif(dfrows, min=0, max=10))

library(dplyr)

df %>% 
  mutate(across(matches("^[a-z]$"),
                ~ get(paste0("foo", cur_column())) / 100 * .x,
                .names = "foo{col}"))
#>            a         fooa        b       foob         c     fooc
#> 1  0.5543269  0.003664775 48.26791 1.49628246  98.06053 5.918428
#> 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
#> 3  1.7751634  0.015808878 52.34582 1.20765132 101.00663 9.765657
#> 4  0.1873201  0.001060757 50.58978 0.83478430  98.85561 5.976363
#> 5  1.1425261  0.006781434 50.35224 1.59495949 107.02225 6.845194
Run Code Online (Sandbox Code Playgroud)

由reprex 包(v0.3.0)于 2021-01-27 创建