一阶差分数据框

phi*_*ill 2 r

我有以下数据框:

>dados

COUNTRY   Year   CO2 emissions Pop. Growth(%)
Argentina  1994      1.23         0.3
Argentina  1995      1.26         0.2
Argentina  1996      1.28         0.4
Argentina  1997      1.24         0.2
Brazil     1994      1.54         0.7
Brazil     1995      1.59         0.6
Brazil     1996      1.60         0.9
Brazil     1997      1.58         1.3
Run Code Online (Sandbox Code Playgroud)

我想首先区分每个国家的变量CO2 emissions和。Pop. Growth(%)我已经尝试过该函数dados[,2:4] <- diff(dados[,2:4]),但它返回了错误:

“r[i1] - r[-length(r):-(length(r) - lag + 1L)] 中的错误:二元运算符的非数字参数”

avi*_*seR 5

这是dplyr

library(dplyr)

df %>%
  group_by(COUNTRY) %>%
  mutate_at(vars(CO2_emissions:Pop_Growth), funs(.-lag(.)))
Run Code Online (Sandbox Code Playgroud)

编辑:从 开始dplyr 0.8.0funs()已被软弃用。对于较新版本的,请使用以下内容dplyr

df %>%
  group_by(COUNTRY) %>%
  mutate_at(vars(CO2_emissions:Pop_Growth), list(~ .x - lag(.x)))
Run Code Online (Sandbox Code Playgroud)

输出:

# A tibble: 8 x 4
# Groups:   COUNTRY [2]
  COUNTRY    Year CO2_emissions Pop_Growth
  <fct>     <int>         <dbl>      <dbl>
1 Argentina  1994         NA        NA    
2 Argentina  1995          0.03     -0.100
3 Argentina  1996          0.02      0.2  
4 Argentina  1997         -0.04     -0.2  
5 Brazil     1994         NA        NA    
6 Brazil     1995          0.05     -0.100
7 Brazil     1996          0.01      0.3  
8 Brazil     1997         -0.02      0.4 
Run Code Online (Sandbox Code Playgroud)

数据:

df = structure(list(COUNTRY = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L), .Label = c("Argentina", "Brazil"), class = "factor"), 
    Year = c(1994L, 1995L, 1996L, 1997L, 1994L, 1995L, 1996L, 
    1997L), CO2_emissions = c(1.23, 1.26, 1.28, 1.24, 1.54, 1.59, 
    1.6, 1.58), Pop_Growth = c(0.3, 0.2, 0.4, 0.2, 0.7, 0.6, 
    0.9, 1.3)), .Names = c("COUNTRY", "Year", "CO2_emissions", 
"Pop_Growth"), class = "data.frame", row.names = c(NA, -8L))
Run Code Online (Sandbox Code Playgroud)