如何正确返回dplyr的字符值呢?

dfr*_*kow 1 r dataframe dplyr

请考虑以下代码:

foo <- function() {
  if (runif(1) < 0.5) {
    return(data.frame(result="low"))
  } else {
    return(data.frame(result="high"))
  }
}

df = data.frame(val=c(1,2,3,4,5,6))
df %>% group_by(val) %>% do(foo())
Run Code Online (Sandbox Code Playgroud)

它是随机的,但如果同时返回"低"和"高"结果,您将看到如下错误:

Warning messages:
1: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
2: In bind_rows_(x, .id) :
  binding character and factor vector, coercing into character vector
3: In bind_rows_(x, .id) :
  binding character and factor vector, coercing into character vector
4: In bind_rows_(x, .id) :
  binding character and factor vector, coercing into character vector
5: In bind_rows_(x, .id) :
  binding character and factor vector, coercing into character vector
Run Code Online (Sandbox Code Playgroud)

我相信返回的第一个值(比如"低")被转换为一个级别的因子,当另一个级别出现时,它会引发dplyr的愤怒.

编写此示例以避免警告的正确方法是什么?

编辑:一个解决方案是:

foo <- function() {
  if (runif(1) < 0.5) {
    return(data.frame(result=factor("low", levels=c("low", "high"))))
  } else {
    return(data.frame(result=factor("high", levels=c("low", "high"))))
  }
}
Run Code Online (Sandbox Code Playgroud)

但是,如果我不提前知道因素水平怎么办?

另外,从根本上说,我想返回一个字符向量,而不是一个因素.

Hon*_*Ooi 6

或者:

  • 用途stringsAsFactors=FALSE:return(data.frame(..., stringsAsFactors=FALSE))

要么:

  • 用途data_frame:return(data_frame(...))

有关因子处理的更多信息,请参阅?data.frame.