SparklyR从Spark上下文中删除表

eye*_*orm 6 r rstudio apache-spark sparklyr

想从Spark上下文('sc')中删除单个数据表.我知道单个缓存表可以不缓存,但这与从sc中删除对象不同 - 据我所知.

library(sparklyr)
library(dplyr)
library(titanic)
library(Lahman)

spark_install(version = "2.0.0")
sc <- spark_connect(master = "local")

batting_tbl <- copy_to(sc, Lahman::Batting, "batting")
titanic_tbl <- copy_to(sc, titanic_train, "titanic", overwrite = TRUE)
src_tbls(sc) 
# [1] "batting" "titanic"

tbl_cache(sc, "batting") # Speeds up computations -- loaded into memory
src_tbls(sc) 
# [1] "batting" "titanic"

tbl_uncache(sc, "batting")
src_tbls(sc) 
# [1] "batting" "titanic"
Run Code Online (Sandbox Code Playgroud)

要断开整个sc,我会使用spark_disconnect(sc),但在这个例子中它会破坏存储在sc内的"titanic"和"batting"表.

相反,我想删除例如"击球"之类的东西spark_disconnect(sc, tableToRemove = "batting"),但这似乎不可能.

小智 14

dplyr::db_drop_table(sc, "batting")
Run Code Online (Sandbox Code Playgroud)

我试过这个功能,看起来很有效.

  • 这不再起作用,请参阅/sf/ask/4949729011/ (2认同)

Ric*_*ton 8

稍微低级的替代方案是

tbl_name <- "batting"
DBI::dbGetQuery(sc, paste("DROP TABLE", tbl_name))
Run Code Online (Sandbox Code Playgroud)