突变数据框中的多个列

H P*_*ark 16 r stata dplyr

我有一个看起来像这样的数据集.

bankname    bankid  year    totass  cash    bond    loans
Bank A      1       1881    244789  7250    20218   29513
Bank B      2       1881    195755  10243   185151  2800
Bank C      3       1881    107736  13357   177612  NA
Bank D      4       1881    170600  35000   20000   5000
Bank E      5       1881    3200000 351266  314012  NA
Run Code Online (Sandbox Code Playgroud)

我想根据银行资产负债表计算一些比率.我希望数据集看起来像这样

bankname    bankid  year    totass  cash    bond    loans   CashtoAsset BondtoAsset LoanstoAsset
Bank A      1       1881    2447890 7250    202100  951300  0.002   0.082   0.388
Bank B      2       1881    195755  10243   185151  2800    0.052   0.945   0.014
Bank C      3       1881    107736  13357   177612  NA  0.123   1.648585431 NA
Bank D      4       1881    170600  35000   20000   5000    0.205   0.117   0.029
Bank E      5       1881    32000000    351266  314012  NA  0.0109  0.009   NA
Run Code Online (Sandbox Code Playgroud)

这是复制数据的代码

bankname <- c("Bank A","Bank B","Bank C","Bank D","Bank E")
bankid <- c( 1, 2,  3,  4,  5)
year<- c( 1881, 1881,   1881,   1881,   1881)
totass  <- c(244789,    195755, 107736, 170600, 32000000)
cash<-c(7250,10243,13357,35000,351266)
bond<-c(20218,185151,177612,20000,314012)
loans<-c(29513,2800,NA,5000,NA)
bankdata<-data.frame(bankname, bankid,year,totass, cash, bond, loans)
Run Code Online (Sandbox Code Playgroud)

首先,我摆脱了资产负债表中的NAs.

cols <- c("totass", "cash", "bond", "loans")
bankdata[cols][is.na(bankdata[cols])] <- 0
Run Code Online (Sandbox Code Playgroud)

然后我计算比率

library(dplyr)
bankdata<-mutate(bankdata,CashtoAsset = cash/totass)
bankdata<-mutate(bankdata,BondtoAsset = bond/totass)
bankdata<-mutate(bankdata,loanstoAsset =loans/totass)
Run Code Online (Sandbox Code Playgroud)

但是,我不是一行一行地计算所有这些比率,而是想要一次性创建这样做.在Stata,我会这样做

foreach x of varlist cash bond loans {
by bankid: gen `x'toAsset = `x'/ totass
}
Run Code Online (Sandbox Code Playgroud)

我该怎么做?

jaz*_*rro 41

更新(截至2017年12月2日)

自从我回答了这个问题后,我意识到有些SO用户一直在检查这个答案.从那以后,dplyr包已经改变了.因此,我留下以下更新.我希望这将有助于一些R用户学习如何使用funs().

.funs现已弃用.你想要使用funs(name = f(.).您可以指定要将功能应用于哪些列funs.一种方法是使用list.另一种方法是使用包含列名的字符向量,您要在其中应用自定义函数list(name = ~f(.)).另一种是指定带有数字的列(例如,在这种情况下为5:7).请注意,如果使用列mutate_at(),则需要更改列位置的数量.看看这个问题.

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = vars(cash:loans))

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = c("cash", "bond", "loans"))

bankdata %>%
mutate_at(.funs = list(toAsset = ~./totass), .vars = 5:7)
Run Code Online (Sandbox Code Playgroud)

我故意给mutate_each()自定义函数,mutate_at()因为这将帮助我安排新的列名.以前,我用过.vars.但我认为vars()在目前的方法中清理列名要容易得多.如果将上述结果另存为.fun,则需要运行以下代码才能删除group_by()列名称.

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = vars(cash:loans))

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = c("cash", "bond", "loans"))

bankdata %>%
mutate_at(.funs = funs(toAsset = ./totass), .vars = 5:7)

#  bankname bankid year   totass   cash   bond loans cash_toAsset bond_toAsset loans_toAsset
#1   Bank A      1 1881   244789   7250  20218 29513   0.02961734  0.082593581    0.12056506
#2   Bank B      2 1881   195755  10243 185151  2800   0.05232561  0.945830247    0.01430359
#3   Bank C      3 1881   107736  13357 177612    NA   0.12397899  1.648585431            NA
#4   Bank D      4 1881   170600  35000  20000  5000   0.20515826  0.117233294    0.02930832
#5   Bank E      5 1881 32000000 351266 314012    NA   0.01097706  0.009812875            NA
Run Code Online (Sandbox Code Playgroud)

原始答案

我想你可以用dplyr以这种方式保存一些打字.缺点是你覆盖了现金,债券和贷款.

names(out) <- gsub(names(out), pattern = "_", replacement = "")
Run Code Online (Sandbox Code Playgroud)

如果您更喜欢预期的结果,我认为有必要打字.重命名部分似乎是你必须要做的事情.

bankdata %>%
    group_by(bankname) %>%
    mutate_each(funs(whatever = ./totass), cash:loans)

#  bankname bankid year   totass       cash        bond      loans
#1   Bank A      1 1881   244789 0.02961734 0.082593581 0.12056506
#2   Bank B      2 1881   195755 0.05232561 0.945830247 0.01430359
#3   Bank C      3 1881   107736 0.12397899 1.648585431         NA
#4   Bank D      4 1881   170600 0.20515826 0.117233294 0.02930832
#5   Bank E      5 1881 32000000 0.01097706 0.009812875         NA
Run Code Online (Sandbox Code Playgroud)


小智 0

你可能会让这件事变得比必要的更困难。只需尝试一下,看看它是否能满足您的需求。

bankdata$CashtoAsset <- bankdata$cash / bankdata$totass
bankdata$BondtoAsset <- bankdata$bond / bankdata$totass
bankdata$loantoAsset <- bankdata$loans / bankdata$totass
bankdata
Run Code Online (Sandbox Code Playgroud)

产生这个:

bankdata$CashtoAsset <- bankdata$cash / bankdata$totass
bankdata$BondtoAsset <- bankdata$bond / bankdata$totass
bankdata$loantoAsset <- bankdata$loans / bankdata$totass
bankdata
Run Code Online (Sandbox Code Playgroud)

这应该会让您朝着正确的方向开始。