如何在 sparklyr 中重新分区数据框

Question

如何在 sparklyr 中重新分区数据框

由于某种原因，这证明很难找到。我可以很容易地在repartitioninpyspark和 in 中找到该函数sparkr，但在 sparklyr 中似乎不存在这样的函数。

有谁知道如何在sparklyr.

Answer 1

现在你可以使用sdf_repartition()，例如

iris_tbl %>%
  sdf_repartition(5L, columns = c("Species", "Petal_Width")) %>%
  spark_dataframe() %>%
  invoke("queryExecution") %>%
  invoke("optimizedPlan") 
# <jobj[139]>
#   class org.apache.spark.sql.catalyst.plans.logical.RepartitionByExpression
# RepartitionByExpression [Species#14, Petal_Width#13], 5
#                          +- InMemoryRelation [Sepal_Length#10, Sepal_Width#11, Petal_Length#12, Petal_Width#13, Species#14], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `iris`
#                                               +- *FileScan csv [Sepal_Length#10,Sepal_Width#11,Petal_Length#12,Petal_Width#13,Species#14] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/var/folders/ry/_l__tbl57d940bk2kgj8q2nj3s_d9b/T/Rtmpjgtnl6/spark_serializ..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Sepal_Length:double,Sepal_Width:double,Petal_Length:double,Petal_Width:double,Species:string>

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，5 月前
查看次数：	2063 次
最近记录：	8 年，5 月前