mutate_at每次使用前一列时,如何改变包含模式(我猜)的所有列dplyr?--> 例如,这里的所有列foo在其名称中都应该使用之前的列进行变异(即,a对于 column fooa,bforfoob等等)。
set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
fooa = runif(dfrows),
b = rnorm(dfrows, mean=50, sd=5),
foob = runif(dfrows, min=0, max=5),
c = rnorm(dfrows, mean=100, sd=10),
fooc = runif(dfrows, min=0, max=10))
df
# a fooa b foob c fooc
# 1 0.5543269 0.6611216 48.26791 3.0999527 98.06053 6.035485
# 2 -0.2802719 0.8783709 51.15647 0.1586242 113.96432 2.299504
# 3 1.7751634 0.8905590 52.34582 2.3070636 101.00663 9.668332
# 4 0.1873201 0.5662805 50.58978 1.6501046 98.85561 6.045547
# 5 1.1425261 0.5935473 50.35224 3.1676038 107.02225 6.396047
library(dplyr)
df %>% mutate(fooa = fooa/100 * a,
foob = foob/100 * b,
fooc = fooc/100 * c)
# a fooa b foob c fooc
# 1 0.5543269 0.003664775 48.26791 1.49628246 98.06053 5.918428
# 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
# 3 1.7751634 0.015808878 52.34582 1.20765132 101.00663 9.765657
# 4 0.1873201 0.001060757 50.58978 0.83478430 98.85561 5.976363
# 5 1.1425261 0.006781434 50.35224 1.59495949 107.02225 6.845194
# Equivalently, in base R:
for (i in c(2, 4, 6)) {
df[,i] = df[,i]/100 * df[, i-1]
}
Run Code Online (Sandbox Code Playgroud)
所以我正在寻找这样的东西,我猜:
# What should <PREVIOUS_COLUMN> be?
df %>% mutate_at(vars(contains('foo')), funs(./100 * <PREVIOUS_COLUMN>))
# OR, even better (more generic but in my case it will always be the previous column):
df %>% mutate_at(vars(contains('foo')), funs(./100 * <COLUMN_NAME_WITH_'foo'_PATTERN_REMOVED>))
Run Code Online (Sandbox Code Playgroud)
编辑:我应该提到原始data.frame可能包含更多的列,可能具有其他模式而不是X then fooX,以便理想的解决方案应该能够正确本地化它们(但我会保留它,因为所有答案都提供了很好的解决方案和功能) .
一个更好的例子是:
set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
fooa = runif(dfrows),
b = rnorm(dfrows, mean=50, sd=5),
foob = runif(dfrows, min=0, max=5),
bla = 5,
c = rnorm(dfrows, mean=100, sd=10),
fooc = runif(dfrows, min=0, max=10),
blo = 8)
df
# a fooa b foob bla c fooc blo
# 1 0.5543269 0.6611216 48.26791 3.0999527 5 98.06053 6.035485 8
# 2 -0.2802719 0.8783709 51.15647 0.1586242 5 113.96432 2.299504 8
# 3 1.7751634 0.8905590 52.34582 2.3070636 5 101.00663 9.668332 8
# 4 0.1873201 0.5662805 50.58978 1.6501046 5 98.85561 6.045547 8
# 5 1.1425261 0.5935473 50.35224 3.1676038 5 107.02225 6.396047 8
Run Code Online (Sandbox Code Playgroud)
一种选择可能是:
df %>%
mutate(across(starts_with("foo"))/100 * across(!matches("foo")))
a fooa b foob c fooc
1 0.5543269 0.003664775 48.26791 1.49628246 98.06053 5.918428
2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
3 1.7751634 0.015808878 52.34582 1.20765132 101.00663 9.765657
4 0.1873201 0.001060757 50.58978 0.83478430 98.85561 5.976363
5 1.1425261 0.006781434 50.35224 1.59495949 107.02225 6.845194
Run Code Online (Sandbox Code Playgroud)
across()这是使用和的另一种方法cur_column()。我个人不建议根据列的位置进行计算,而是建议使用列名称,因为这看起来更安全。
在下面的示例中,我们循环遍历列a,b并c使用和访问每个相应列across的值。fooget()cur_column
set.seed(13)
dfrows = 5
df = data.frame(a = rnorm(dfrows),
fooa = runif(dfrows),
b = rnorm(dfrows, mean=50, sd=5),
foob = runif(dfrows, min=0, max=5),
c = rnorm(dfrows, mean=100, sd=10),
fooc = runif(dfrows, min=0, max=10))
library(dplyr)
df %>%
mutate(across(matches("^[a-z]$"),
~ get(paste0("foo", cur_column())) / 100 * .x,
.names = "foo{col}"))
#> a fooa b foob c fooc
#> 1 0.5543269 0.003664775 48.26791 1.49628246 98.06053 5.918428
#> 2 -0.2802719 -0.002461827 51.15647 0.08114656 113.96432 2.620614
#> 3 1.7751634 0.015808878 52.34582 1.20765132 101.00663 9.765657
#> 4 0.1873201 0.001060757 50.58978 0.83478430 98.85561 5.976363
#> 5 1.1425261 0.006781434 50.35224 1.59495949 107.02225 6.845194
Run Code Online (Sandbox Code Playgroud)
由reprex 包(v0.3.0)于 2021-01-27 创建