如何使用数据表应用函数?

Rem*_*i.b 1 r data.table

如何根据另一个变量的唯一值对一个或多个变量应用函数?就像是

dt[,DoStuff(x) ,y]
Run Code Online (Sandbox Code Playgroud)

考虑mpg来自ggplot2 的数据集

require(data.table)
require(ggplot2)
as.data.table(mpg)
     manufacturer  model displ year cyl      trans drv cty hwy fl   class
  1:         audi     a4   1.8 1999   4   auto(l5)   f  18  29  p compact
  2:         audi     a4   1.8 1999   4 manual(m5)   f  21  29  p compact
  3:         audi     a4   2.0 2008   4 manual(m6)   f  20  31  p compact
  4:         audi     a4   2.0 2008   4   auto(av)   f  21  30  p compact
  5:         audi     a4   2.8 1999   6   auto(l5)   f  16  26  p compact
 ---                                                                     
230:   volkswagen passat   2.0 2008   4   auto(s6)   f  19  28  p midsize
231:   volkswagen passat   2.0 2008   4 manual(m6)   f  21  29  p midsize
232:   volkswagen passat   2.8 1999   6   auto(l5)   f  16  26  p midsize
233:   volkswagen passat   2.8 1999   6 manual(m5)   f  18  26  p midsize
234:   volkswagen passat   3.6 2008   6   auto(s6)   f  17  26  p midsize
Run Code Online (Sandbox Code Playgroud)

我想将manufacturer每个唯一值的唯一名称(由下划线分隔)粘贴在一起fl.我试过了

as.data.table(mpg)[,list(x = function(manufacturer) {paste(unique(manufacturer), collapse="_")} ),fl]

Error in `[.data.table`(as.data.table(mpg), , list(x = function(manufacturer) { : 
All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
Run Code Online (Sandbox Code Playgroud)

另一种解决方案是

sapply(unique(mpg$fl), FUN=function(x){paste(unique(mpg$manufacturer[mpg$fl==x]),collapse="_")})
Run Code Online (Sandbox Code Playgroud)

Her*_*oka 5

你可以试试这个:

as.data.table(mpg)[,paste(unique(manufacturer),collapse="_"),by=fl]
Run Code Online (Sandbox Code Playgroud)

或者,如果您的功能更精细,您可以单独编写:

myfun <- function(x){
  u_x <- unique(x)
  return(paste(u_x,collapse="_"))
}


res <- as.data.table(mpg)[,myfun(manufacturer),by=fl]
Run Code Online (Sandbox Code Playgroud)

  • 我不知道我们是否'必须'那样做(这总是很棘手,'总是'和'从不'),但我认为这有效并使你的代码可读. (2认同)