使用循环时tbl_df和data.frame的区别

pet*_*r_w 5 r dplyr

我一直在dplyr tbl_df中循环遍历值,尝试打印两列的唯一组合.经过多次试验和错误后,我只能通过将tbl_df转换回标准data.frame来获得所需的输出.我知道两种结构之间的主要区别,但我仍然无法理解我看到的每种结构的不同输出.

例如,使用此数据

hospital <- rep(c("Hospital 1", "Hospital 2", "Hospital 3"), 3)
ward <- LETTERS[1:2]
hospitals <- data.frame(cbind(hospital, ward))
hospitals[order(hospitals$hospital, hospitals$ward), ]

#     hospital ward
# 1 Hospital 1    A
# 7 Hospital 1    A
# 4 Hospital 1    B
# 5 Hospital 2    A
# 2 Hospital 2    B
# 8 Hospital 2    B
# 3 Hospital 3    A
# 9 Hospital 3    A
# 6 Hospital 3    B
Run Code Online (Sandbox Code Playgroud)

和以下循环

for(hosp in unique(hospitals$hospital)){
  for(wa in unique(hospitals[hospitals$hospital==hosp, "ward"])){
    print(paste(hosp, wa, sep=" "))
    }
  }
Run Code Online (Sandbox Code Playgroud)

我可以得到我想要的输出

#[1] "Hospital 1 A"
#[1] "Hospital 1 B"
#[1] "Hospital 2 B"
#[1] "Hospital 2 A"
#[1] "Hospital 3 A"
#[1] "Hospital 3 B"
Run Code Online (Sandbox Code Playgroud)

但是使用相同数据的tbl_df,我得到了不同的输出

hospitals2 <- tbl_df(hospitals)

for(hosp in unique(hospitals2$hospital)){
  for(wa in unique(hospitals2[hospitals2$hospital==hosp, "ward"])){
    print(paste(hosp, wa, sep=" "))
    }
  }


#[1] "Hospital 1 A" "Hospital 1 B"
#[1] "Hospital 2 B" "Hospital 2 A"
#[1] "Hospital 3 A" "Hospital 3 B"
Run Code Online (Sandbox Code Playgroud)

它不仅仅是打印差异,这似乎是三个双元素向量而不是六个单元素向量,而我的后续代码只能在正常数据帧上运行循环时按预期工作.

谁能解释为什么我看到这些差异?

Kha*_*haa 6

你不能做for looptbl_df与子集[.文档说明了一切:

[永远不要简化(丢弃),所以总是返回data.frame.

你看到了hospitals2[hospitals2$hospital==hosp, "ward"]回报data.frame

hospitals2[hospitals2$hospital==hosp, "ward"]
#Source: local data frame [3 x 1]

#  ward
#1    A
#2    B
#3    A
Run Code Online (Sandbox Code Playgroud)

hospitals[hospitals$hospital==hosp, "ward"]
#[1] A B A
#Levels: A B
Run Code Online (Sandbox Code Playgroud)

使用[[提取的列向量,例如

for(hosp in unique(hospitals2$hospital)){
    for(wa in unique(hospitals[hospitals$hospital==hosp,][["ward"]])){
        print(paste(hosp, wa, sep=" "))
    }
} 
#[1] "Hospital 1 A"
#[1] "Hospital 1 B"
#[1] "Hospital 2 B"
#[1] "Hospital 2 A"
#[1] "Hospital 3 A"
#[1] "Hospital 3 B"
Run Code Online (Sandbox Code Playgroud)

  • 另一个证据是,通过向第一个循环添加`drop = FALSE`,您将获得与`tbl_df`对象相同的行为.`for(wa in unique(医院[医院$ hospital == hosp,"ward",drop = FALSE])){` (2认同)