Plot line and bar graph (with secondary axis for line graph) using ggplot

Bur*_*utt 2 r ggplot2

Problem

I have just started R two days back. I have gone through some basic R tutorials and I am able to plot two dimensional data. I pull data from an Oracle database. Now, I am having problems when I try to merge two graph types (Line and Bar) using secondary axis.

I have no problem, plotting this data on Excel. Following is the plot:

在此处输入图片说明

I am unable to plot it on R. I searched some related examples but I am unable to tweak it as per my requirements (Combining Bar and Line chart (double axis) in ggplot2)

Code

Following is the code I am using to plot bar and line graphs separately:

Bar:

p <- ggplot(data = df, aes(x = MONTHS, y = BASE)) + 
    geom_bar(stat="identity") + 
    theme_minimal() +
    geom_text(aes(label = BASE), vjust = 1.6, color = "White", size = 2.5)
Run Code Online (Sandbox Code Playgroud)

Line:

p1 <- ggplot(data = df, aes(x = MONTHS, y = df$INTERNETPERCENTAGE, group = 1)) + 
    geom_line() + 
    geom_point()
Run Code Online (Sandbox Code Playgroud)

Data

Update: I have the following data (raw data cleansed of "," and "%" signs):

> dput(head(df,20))
structure(list(MONTHS = structure(c(11L, 10L, 3L, 5L, 4L, 8L, 
1L, 9L, 7L, 6L, 2L, 13L, 12L), .Label = c("Apr-18", "Aug-18", 
"Dec-17", "Feb-18", "Jan-18", "Jul-18", "Jun-18", "Mar-18", "May-18", 
"Nov-17", "Oct-17", "Oct-18", "Sep-18"), class = "factor"), BASE = c(40756228L, 
41088219L, 41642601L, 42017111L, 42439446L, 42847468L, 43375319L, 
43440484L, 43464735L, 43326823L, 43190949L, 43015301L, 42780071L
), INTERNETUSERGREATERTHAN0KB = c(13380576L, 13224502L, 14044105L, 
14239169L, 14011423L, 14736043L, 14487827L, 14460410L, 14632695L, 
14896654L, 15019329L, 14141766L, 14209288L), INTERNETPERCENTAGE = c(33L, 
32L, 34L, 34L, 33L, 34L, 33L, 33L, 34L, 34L, 35L, 33L, 33L), 
    SMARTPHONE = c(11610216L, 11875033L, 12225965L, 12412010L, 
    12760251L, 12781082L, 13142400L, 13295826L, 13422476L, 13408216L, 
    13504339L, 13413596L, 13586438L), SMARTPHONEPERCENTAGE = c(28L, 
    29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 32L
    ), INTERNETUSAGEGREATERTHAN0KB4G = c(829095L, 969531L, 1181411L, 
    1339620L, 1474300L, 1733027L, 1871816L, 1967129L, 2117418L, 
    2288215L, 2453243L, 2624865L, 2817199L)), row.names = c(NA, 
13L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

Mau*_*ers 5

请注意,我的回答是基于您原始的“未清理”数据(我附在帖子底部)。

这里的关键是转换百分比值,使它们使用与BASE. 然后,我们应用逆变换将原始百分比值显示为第二个 y 轴。

一个(个人)警告:辅助轴通常不是一个好主意。就个人而言,我会使用分面或两个单独的图表来避免图表的混乱和过载。另请注意,哈德利本人不是双 y 轴的粉丝,因此双轴ggplot2支持(明智地)是有限的。

除此之外,这里有一个选项:

  1. 首先,让我们清理数据(删除千位分隔符、百分号等)。

    library(tidyverse)
    df.clean <- df %>%
        mutate_if(is.factor, as.character) %>%
        gather(USAGE, PERCENTAGE, INTERNETPERCENTAGE, SMARTPHONEPERCENTAGE) %>%
        mutate(
            MONTHS = factor(MONTHS, levels = df$MONTHS),
            BASE = as.numeric(str_replace_all(BASE, ",", "")),
            PERCENTAGE = as.numeric(str_replace(PERCENTAGE, "%", "")))
    
    Run Code Online (Sandbox Code Playgroud)
  2. 我们现在计算变换系数:

    y1 <- min(df.clean$BASE)
    y2 <- max(df.clean$BASE)
    x1 <- min(df.clean$PERCENTAGE)
    x2 <- max(df.clean$PERCENTAGE)
    b <- (y2 - y1) / (x2 - x1)
    a <- y1 - b * x1
    
    Run Code Online (Sandbox Code Playgroud)
  3. 现在进行绘图:

    df.clean %>%
        mutate(perc.scaled = a + b * PERCENTAGE) %>%
        ggplot(aes(MONTHS, BASE)) +
        geom_col(
            data = df.clean %>% distinct(MONTHS, .keep_all = TRUE),
            aes(MONTHS, BASE), fill = "dodgerblue4") +
        geom_point(aes(MONTHS, perc.scaled, colour = USAGE, group = USAGE)) +
        geom_line(aes(MONTHS, perc.scaled, colour = USAGE, group = USAGE)) +
        geom_label(
            aes(MONTHS, perc.scaled, label = PERCENTAGE, fill = USAGE),
            vjust = 1.4,
            show.legend = F) +
        scale_y_continuous(
                name =  "BASE",
                sec.axis = sec_axis(~ (. - a) / b, name = "Percentage")) +
        coord_cartesian(ylim = c(0.99 * min(df.clean$BASE), max(df.clean$BASE))) +
        theme_minimal() +
        theme(legend.position = "bottom")
    
    Run Code Online (Sandbox Code Playgroud)

在此处输入图片说明


样本数据

df <- structure(list(MONTHS = structure(c(11L, 10L, 3L, 5L, 4L, 8L,
1L, 9L, 7L, 6L, 2L, 13L, 12L), .Label = c("APR-18", "AUG-18",
"DEC-17", "FEB-18", "JAN-18", "JUL-18", "JUN-18", "MAR-18", "MAY-18",
"NOV-17", "OCT-17", "OCT-18", "SEP-18"), class = "factor"), BASE = structure(c(1L,
2L, 3L, 4L, 5L, 7L, 11L, 12L, 13L, 10L, 9L, 8L, 6L), .Label = c("40,756,228",
"41,088,219", "41,642,601", "42,017,111", "42,439,446", "42,780,071",
"42,847,468", "43,015,301", "43,190,949", "43,326,823", "43,375,319",
"43,440,484", "43,464,735"), class = "factor"), INTERNETUSERGREATERTHAN0KB = structure(c(2L,
1L, 4L, 7L, 3L, 11L, 9L, 8L, 10L, 12L, 13L, 5L, 6L), .Label = c("13,224,502",
"13,380,576", "14,011,423", "14,044,105", "14,141,766", "14,209,288",
"14,239,169", "14,460,410", "14,487,827", "14,632,695", "14,736,043",
"14,896,654", "15,019,329"), class = "factor"), INTERNETPERCENTAGE = structure(c(2L,
1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L, 4L, 2L, 2L), .Label = c("32%",
"33%", "34%", "35%"), class = "factor"), SMARTPHONE = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 11L, 9L, 12L, 10L, 13L), .Label = c("11,610,216",
"11,875,033", "12,225,965", "12,412,010", "12,760,251", "12,781,082",
"13,142,400", "13,295,826", "13,408,216", "13,413,596", "13,422,476",
"13,504,339", "13,586,438"), class = "factor"), SMARTPHONEPERCENTAGE = structure(c(1L,
2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c("28%",
"29%", "30%", "31%", "32%"), class = "factor"), INTERNETUSAGEGREATERTHAN0KB4G = structure(c(12L,
13L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("1,181,411 ",
"1,339,620 ", "1,474,300 ", "1,733,027 ", "1,871,816 ", "1,967,129 ",
"2,117,418 ", "2,288,215 ", "2,453,243 ", "2,624,865 ", "2,817,199 ",
"829,095 ", "969,531 "), class = "factor")), row.names = c(NA,
13L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)