从 Excel 迁移到 R:如何根据特定单元格值操作数据?

Snu*_*rre 5 math r

我正在慢慢地从 Excel 转向 R,但在 Excel 中需要两秒钟才能完成的任务中不断遇到问题...例如,请参阅以下法国和英国 GDP 的数据示例:

df

假设我想计算 1929 年(即大萧条)以来的百分比变化。在 Excel 中,我会在法国的新列中执行类似的操作:=(B2/$B$11)*100然后将公式填充到相邻单元格。然后,对英国重复一遍。

你会如何在 R 中做到这一点(注意,这只是一个例子。我对背后的思考过程感兴趣)?显然,数据的结构会因三个变量而有所不同:年份、国家/地区、国内生产总值。

我正在考虑使用mutate()然后case_when()确定正确的国家/地区。但这就是我陷入困境的地方。看看我的代码。数据是麦迪逊

library(tidyverse)
library(ggplot2)
library(haven)
library(readxl)

# Loading df
df <- read_excel("/PATH TO DATA/mpd2018.xlsx", sheet = 2)

# Tidy dataset
df <- df %>%
  transmute(
    cntry = as_factor(countrycode), # Rename and define as factor
    year = zap_labels(year), # Zap labels
    gdp = zap_labels(rgdpnapc) # Rename and zap labels
  ) %>%
  dplyr::filter(
    cntry %in% c("FRA","GBR"), # Keep only FRA and GRB
    year >= 1920 & year <= 1950 # Only the interval between 1920 and 1950
  )

# Calculations 
 df <- df %>% mutate(
              gdp_rel = case_when(
                cntry == "FRA" ~ (df$gdp/df[10,3])*100,
                cntry == "GBR" ~ (df$gdp/df[41,3])*100
              ))
                
Run Code Online (Sandbox Code Playgroud)

首先,代码会产生错误。但更重要的是,我相信这可以比通过df[x, y]. 什么是数据框更大?

Oli*_*ver 3

有多种方法可以实现您想要的结果。这里有 2 个不同的选项。

library(tidyverse)
# Seed for reproducibility
set.seed(1234) 
# Example data
data <- data.frame(Year = 1920:1939, 
                   France = 1920:1939 * 3 + rnorm(1939 - 1920 + 1, 5, 10), 
                   Germany = 1920:1939 * 3.5 + rnorm(1939 - 1920 + 1, 2, 18))
row_id <- which(data$Year == 1929)
# dplyr. Note that "across" performs caclulation across all columns 
# selected in the first argument
data %>% 
  mutate(across(-Year, # All columns except for year 
                #Row 10 (row_id) has year = 1929
                ~ . / .[[row_id]] * 100, 
                # Add column name to new transformed result.
                .names = '{.col}_return')) 


# Manual way
res <- list()
for(i in names(data)[-1]){
  # Manual mutate
  res[[paste0(i, '_return')]] <- data[[i]] / data[10, i] * 100
}
# Combine result
cbind(data, res)
Run Code Online (Sandbox Code Playgroud)

两者都会产生以下结果(在模拟数据上):

   Year   France  Germany France_return Germany_return
1  1920 5752.929 6724.414      99.47830       99.81832
2  1921 5770.774 6716.668      99.78687       99.70334
3  1922 5781.844 6721.070      99.97830       99.76869
4  1923 5750.543 6740.773      99.43704      100.06115
5  1924 5781.291 6723.513      99.96873       99.80495
6  1925 5785.061 6713.432     100.03391       99.65531
7  1926 5777.253 6753.346      99.89889      100.24779
8  1927 5780.534 6728.074      99.95563       99.87266
9  1928 5783.355 6749.728     100.00442      100.19408
10 1929 5783.100 6736.653     100.00000      100.00000
11 1930 5790.228 6776.841     100.12326      100.59656
12 1931 5788.016 6751.939     100.08502      100.22691
13 1932 5793.237 6751.230     100.17530      100.21639
14 1933 5804.645 6758.477     100.37255      100.32397
15 1934 5816.595 6741.676     100.57919      100.07457
16 1935 5808.897 6753.483     100.44608      100.24983
17 1936 5807.890 6738.759     100.42867      100.03127
18 1937 5806.888 6757.362     100.41134      100.30741
19 1938 5810.628 6779.703     100.47602      100.63904
20 1939 5846.158 6780.114     101.09040      100.64514
Run Code Online (Sandbox Code Playgroud)

长与宽

根据 SnupSnurre 的评论,我在这里提供了一个示例,说明如何假设数据以“长”格式(垂直)存储。


# Use pivot_longer to make wide data long
data_long <- pivot_longer(data, 
                          -Year, 
                          names_to = 'Country')

# Calculate on long format:
(return_1929 <- data_long %>% 
    # Group by country, calculations will be done for each country
  group_by(Country) %>% 
    # Perform the actual calculations
  mutate(value_return = value / value[Year == 1929] * 100) %>%
    # Remove the country grouping
  ungroup()
)
# Return to wide format
return_1929 %>% 
  pivot_wider(id_cols = Year, 
              # Column to "expand" to a wide format.
              names_from = Country,
              # Coluns to get values from
              values_from = c(value, value_return),
              )
Run Code Online (Sandbox Code Playgroud)