在LINQ风格的R中的数据操作

Vad*_*rov 9 linq r

我很感兴趣,如果在R中有一个包支持调用链式数据操作,比如C#/ LINQ,F#?我想启用这样的样式:

var list = new[] {1,5,10,12,1};
var newList = list
  .Where(x => x > 5)
  .GroupBy(x => x%2)
  .OrderBy(x => x.Key.ToString())
  .Select(x => "Group: " + x.Key)
  .ToArray();
Run Code Online (Sandbox Code Playgroud)

Owe*_*wen 11

我不知道一个,但这是它的样子的开始:

`%then%` = function(x, body) {
    x = substitute(x)
    fl = as.list(substitute(body))
    car = fl[[1L]]
    cdr = {
        if (length(fl) == 1)
            list()
        else
            fl[-1L]
    }
    combined = as.call(
        c(list(car, x), cdr)
    )
    eval(combined, parent.frame())
}

df = data.frame(x = 1:7)
df %then% subset(x > 2) %then% print
Run Code Online (Sandbox Code Playgroud)

这打印

  x
3 3
4 4
5 5
6 6
7 7
Run Code Online (Sandbox Code Playgroud)

如果你继续使用这样的黑客,你应该很容易找到令人满意的语法;-)

编辑:结合plyr,这一点都不错:

(data.frame(
    x = c(1, 1, 1, 2, 2, 2),
    y = runif(6)
)
    %then% subset(y > 0.2)
    %then% ddply(.(x), summarize,
            ysum   = sum(y),
            ycount = length(y)
        )
    %then% print
)
Run Code Online (Sandbox Code Playgroud)


kwc*_*cto 5

dplyr链接语法类似于LINQ(股票示例):

flights %>%
  group_by(year, month, day) %>%
  select(arr_delay, dep_delay) %>%
  summarise(
    arr = mean(arr_delay, na.rm = TRUE),
    dep = mean(dep_delay, na.rm = TRUE)
  ) %>%
  filter(arr > 30 | dep > 30)
Run Code Online (Sandbox Code Playgroud)

dplyr简介 - 链接