Dav*_*vid 5 r dplyr data.table dtplyr
所以我正在尝试翻译一些dplyr代码。我试图从将dplyr转换为data.table的程序包中获得帮助,但仍然无法正常工作。错误是row_number来自dplyr..
我需要dplyr代码中的所有步骤(即使在这里没有用mtcars)
library(dplyr)
library(dtplyr) # from https://github.com/tidyverse/dtplyr
library(data.table)
mtcars %>%
distinct(mpg, .keep_all = TRUE) %>%
group_by(am) %>%
arrange(mpg, .by_group = TRUE) %>%
mutate(row_num = LETTERS[row_number()]) %>%
ungroup()
# using dtplyr
dt <- lazy_dt(mtcars)
dt %>%
distinct(mpg, .keep_all = TRUE) %>%
group_by(am) %>%
arrange(mpg, .by_group = TRUE) %>%
mutate(row_num = LETTERS[row_number()]) %>%
ungroup() %>%
show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = c("A",
#> "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N",
#> "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z")[row_number()]),
#> keyby = .(am)]
# I then use the query from dtplyr
DT <- as.data.table(mtcars)
unique(DT, by = "mpg")[order(am, mpg)][, `:=`(row_num = c("A",
"B", "C", "D", "E", "F", "G",
"H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S",
"T", "U", "V", "W", "X", "Y",
"Z")[row_number()]), keyby = .(am)]
#> row_number() should only be called in a data context
Run Code Online (Sandbox Code Playgroud)
由reprex软件包(v0.3.0)创建于2019-07-14
我可以推荐rowid函数吗?它在“幕后”进行了分组步骤,您可能会发现它看起来更干净:
unique(DT, by='mpg')[order(am, mpg), row_num := LETTERS[rowid(am)]]
Run Code Online (Sandbox Code Playgroud)
如果您喜欢链接,还可以将所有内容都包含在内[]:
DT[ , .SD[1L], by = mpg
][order(am, mpg), row_num := LETTERS[rowid(am)]]
Run Code Online (Sandbox Code Playgroud)
我正在尝试对翻译进行一些调整,以便 dtplyr 会自动生成更像您想要的内容:
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
dt <- lazy_dt(mtcars)
dt %>%
distinct(mpg, .keep_all = TRUE) %>%
group_by(am) %>%
arrange(mpg, .by_group = TRUE) %>%
mutate(row_num = LETTERS[row_number()]) %>%
ungroup() %>%
show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = ..LETTERS[seq_len(.N)]),
#> keyby = .(am)]
Run Code Online (Sandbox Code Playgroud)
或者像@MichaelChirico 建议的那样避免分组:
dt %>%
distinct(mpg, .keep_all = TRUE) %>%
arrange(am, mpg) %>%
mutate(row_num = LETTERS[row_number(am)]) %>%
ungroup() %>%
show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = ..LETTERS[frank(am,
#> ties.method = "first", na.last = "keep")])]
Run Code Online (Sandbox Code Playgroud)
(使用..前面的LETTERS是 data.table 功能,它清楚地表明您指的是数据框之外的变量;这里可能没有必要,但我认为安全总比抱歉好。)
我们可以用seq_len(.N)
unique(DT, by = "mpg")[order(am, mpg)][,
`:=`(row_num = LETTERS[seq_len(.N)]), by = .(am)][]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
376 次 |
| 最近记录: |