I need to write a function that would allow me to quickly do a dual axis plot using ggplot2. I know that dual axis plots are generally deprecated, but still I think it may be useful if you're after observing similar patterns in time series (for all of those who disagree, please treat this question strictly technically). It is actually possible with sec_axis() function from ggplot2, but it needs a defined formula. So here's my attempt to calculate this automatically:
dual_plot <- function(data, x, y_left, y_right){
x <- ensym(x)
y_left <- ensym(y_left)
y_right <- ensym(y_right)
ratio_model <- lm(eval(y_left) ~ eval(y_right), data = data)
data %>%
select(!!x, !!y_left, !!y_right) %>%
mutate(!!y_right := predict(ratio_model)) %>%
gather(k, v, -!!x) %>%
ggplot() +
geom_line(aes(!!x, v, colour = k)) +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio_model$coefficients[[2]] -
ratio_model$coefficients[[1]],
name = rlang::as_string(y_right))) +
labs(y = rlang::as_string(y_left))
}
Run Code Online (Sandbox Code Playgroud)
However, lm may fit a negative direction coefficient which reverse the trend and is really misleading. So I need another approach to calculating this formula - either using linear regression with coefficient constrain or a clever way of fitting a formula. How can it be done in R? Or what are the alternatives to sec_axis that would allow to draw dual axis plot automatically?
@Edit: One example would be:
df <- structure(list(date = structure(c(17167, 17168, 17169, 17170,
17171, 17172, 17173, 17174, 17175, 17176, 17177, 17178, 17179,
17180, 17181), class = "Date"), y_right = c(-107073.90734625,
-633197.630546488, -474626.43291613, -306006.801458608, 56062.072352192,
522580.236751187, 942796.389093215, -101845.73678439, -632658.677118481,
-479257.088784885, -303439.231633988, 50273.2477880417, 521669.062954895,
948127.92455586, -107073.90734625), y_left = c(1648808.16, 3152543.07,
2702739.91, 2382616.25, 1606089.88, 1592465.75, 1537283.99, 2507221.61,
3049076.19, 3125424.4, 2774215.1, 2356412.98, 1856506.41, 1477195.08,
2485713.2)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-15L))
df %>%
dual_plot(date, y_left, y_right)
Run Code Online (Sandbox Code Playgroud)
The calculated ratio model has direction coefficient of -1.02, so the y_right is reversed (where the function is decreasing, the plotted function is increasing and the other way around) and hence misleading.
这是在两个斜率之间设置最小可接受比率的方法。如果比率较小,则斜率不会变换,而只会变换水平,从而避免像您描述的那样过度误导图表。
我将阈值设置为0.1,但是如果您只是想避免此处的特定情况(您不希望翻转第二个序列使其对齐),则可以将其设置为0。
dual_plot <- function(data, x, y_left, y_right){
x <- ensym(x)
y_left <- ensym(y_left)
y_right <- ensym(y_right)
min_slope_ratio <- 0.1
ratio_model <- lm(eval(y_left) ~ eval(y_right), data = data)
ratio_slope <- ratio_model$coefficients[[2]]
if (ratio_model$coefficients[[2]] < min_slope_ratio) {
ratio_model <- lm(eval(y_left) ~ 1, data = data)
ratio_slope <- min_slope_ratio
}
ratio_intercept <- ratio_model$coefficients[[1]]
data %>%
select(!!x, !!y_left, !!y_right) %>%
mutate(!!y_right := !!y_right * ratio_slope + ratio_intercept) %>%
# mutate(!!y_right := predict(ratio_model)) %>%
gather(k, v, -!!x) %>%
ggplot() +
geom_line(aes(!!x, v, colour = k)) +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio_slope -
ratio_intercept,
name = rlang::as_string(y_right))) +
labs(y = rlang::as_string(y_left))
}
Run Code Online (Sandbox Code Playgroud)
在这里,限制被触发,我们避免翻转第二个系列
df %>%
dual_plot(date, y_left, y_right)
Run Code Online (Sandbox Code Playgroud)
在此,不触发限制。
df %>%
mutate(y_right = -1 * y_right) %>%
dual_plot(date, y_left, y_right)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
258 次 |
| 最近记录: |