Nel*_*Gon 4 r ggplot2 data.table
我目前正在学习非常强大和高效的data.table框架(包).然而,我似乎无法弄清楚如何做这样的事情.我要做的是按多列(制造商和运营商)分组,根据此分组获取航班数量,然后按降序排列,然后排列前十大制造商和运营商的ggplot.我会在tidyverse中执行以下操作:
library(nycflights13)
library(tidyverse)
flights %>%
left_join(planes, by = "tailnum") %>%
group_by(manufacturer, carrier) %>%
summarise(N = n()) %>%
arrange(desc(N)) %>%
top_n(10, N) %>%
ggplot(aes(carrier, N, fill = manufacturer)) + geom_col() + guides(fill = FALSE)
Run Code Online (Sandbox Code Playgroud)
这是我尝试过的:(我离开问题几分钟尝试解决但失败了)
library(data.table)
fly<-copy(nycflights13::flights)
setDT(fly)
setkey(fly,tailnum)
planes1 <- copy(planes)
setDT(planes1)
setkey(planes1, tailnum)
#head(planes1,2)
Merged <- merge(fly, planes1, by = "tailnum")
#Group by manufacturer
Merged[, .N, by = .(manufacturer,carrier)] #[, order(manufacturer, carrier)]
Run Code Online (Sandbox Code Playgroud)
问题是我无法返回有序数据,也不知道如何"链接"到ggplot而不先将有序合并保存为对象.
您可以使用方括号[和]中链东西一起data.table.此外,您可以在data.table语法ggplot的j部分内执行调用:
nms <- setdiff(names(planes1), "tailnum")
fly[planes1, on = .(tailnum), (nms) := mget(nms)
][, .N, by = .(manufacturer,carrier)
][order(-N)
][, .SD[1:10], by = .(manufacturer,carrier)
][, ggplot(.SD, aes(carrier, N, fill = manufacturer)) +
geom_col() +
guides(fill = FALSE)]
Run Code Online (Sandbox Code Playgroud)
这使: