在R中使用dplyr进行过滤时,为什么过滤出的变量级别会保留在过滤后的数据中？

Question

在R中使用dplyr进行过滤时,为什么过滤出的变量级别会保留在过滤后的数据中？

我正在尝试使用包中的filter命令过滤掉一堆数据dplyr.一切看起来都像我希望的那样,但是当我尝试从新过滤的数据中绘制一些图表时,我过滤掉的所有级别都显示出来(尽管没有值).但是他们在那里的事实仍然在抛弃我的水平轴.

所以有两个问题:

1)为什么这些过滤的级别仍在数据中？

2)如何过滤使这些不再存在？

这是一个小例子,您可以运行以查看我在说什么:

library(dplyr)
library(ggvis)

# small example frame
data <- data.frame(
  x = c(1:10),
  y = rep(c("yes", "no"), 5)
)

# filtering to only include data with "yes" in y variable
new_data <- data %>%
  filter(y == "yes")

levels(new_data) ## Why is "no" showing up as a level for this if I've filtered that out?

# Illustration of the filtered values still showing up on axis
new_data %>%
  ggvis(~y, ~x) %>%
  layer_bars()

Run Code Online (Sandbox Code Playgroud)

谢谢你的帮助.

Answer 1

Ben*_*ker 5

R中的因素在过滤时不会自动降低水平.您可能认为这是一个愚蠢的默认(我这样做),但它很容易处理 - 只需droplevels在结果上使用该函数.

new_data <- data %>%
  filter(y == "yes") %>%
  droplevels
levels(new_data$y)
## [1] "yes"

Run Code Online (Sandbox Code Playgroud)

如果你一直这样做,你可以定义一个新的功能

dfilter <- function(...) droplevels(filter(...))

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，5 月前
查看次数：	3061 次
最近记录：	10 年，5 月前