我有以下数据框:
>dados
COUNTRY Year CO2 emissions Pop. Growth(%)
Argentina 1994 1.23 0.3
Argentina 1995 1.26 0.2
Argentina 1996 1.28 0.4
Argentina 1997 1.24 0.2
Brazil 1994 1.54 0.7
Brazil 1995 1.59 0.6
Brazil 1996 1.60 0.9
Brazil 1997 1.58 1.3
Run Code Online (Sandbox Code Playgroud)
我想首先区分每个国家的变量CO2 emissions和。Pop. Growth(%)我已经尝试过该函数dados[,2:4] <- diff(dados[,2:4]),但它返回了错误:
“r[i1] - r[-length(r):-(length(r) - lag + 1L)] 中的错误:二元运算符的非数字参数”
这是dplyr:
library(dplyr)
df %>%
group_by(COUNTRY) %>%
mutate_at(vars(CO2_emissions:Pop_Growth), funs(.-lag(.)))
Run Code Online (Sandbox Code Playgroud)
编辑:从 开始dplyr 0.8.0,funs()已被软弃用。对于较新版本的,请使用以下内容dplyr
df %>%
group_by(COUNTRY) %>%
mutate_at(vars(CO2_emissions:Pop_Growth), list(~ .x - lag(.x)))
Run Code Online (Sandbox Code Playgroud)
输出:
# A tibble: 8 x 4
# Groups: COUNTRY [2]
COUNTRY Year CO2_emissions Pop_Growth
<fct> <int> <dbl> <dbl>
1 Argentina 1994 NA NA
2 Argentina 1995 0.03 -0.100
3 Argentina 1996 0.02 0.2
4 Argentina 1997 -0.04 -0.2
5 Brazil 1994 NA NA
6 Brazil 1995 0.05 -0.100
7 Brazil 1996 0.01 0.3
8 Brazil 1997 -0.02 0.4
Run Code Online (Sandbox Code Playgroud)
数据:
df = structure(list(COUNTRY = structure(c(1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L), .Label = c("Argentina", "Brazil"), class = "factor"),
Year = c(1994L, 1995L, 1996L, 1997L, 1994L, 1995L, 1996L,
1997L), CO2_emissions = c(1.23, 1.26, 1.28, 1.24, 1.54, 1.59,
1.6, 1.58), Pop_Growth = c(0.3, 0.2, 0.4, 0.2, 0.7, 0.6,
0.9, 1.3)), .Names = c("COUNTRY", "Year", "CO2_emissions",
"Pop_Growth"), class = "data.frame", row.names = c(NA, -8L))
Run Code Online (Sandbox Code Playgroud)