Dav*_*aid 2 r apache-spark sparklyr
由于某种原因,这证明很难找到。我可以很容易地在repartitioninpyspark和 in 中找到该函数sparkr,但在 sparklyr 中似乎不存在这样的函数。
有谁知道如何在sparklyr.
现在你可以使用sdf_repartition(),例如
iris_tbl %>%
sdf_repartition(5L, columns = c("Species", "Petal_Width")) %>%
spark_dataframe() %>%
invoke("queryExecution") %>%
invoke("optimizedPlan")
# <jobj[139]>
# class org.apache.spark.sql.catalyst.plans.logical.RepartitionByExpression
# RepartitionByExpression [Species#14, Petal_Width#13], 5
# +- InMemoryRelation [Sepal_Length#10, Sepal_Width#11, Petal_Length#12, Petal_Width#13, Species#14], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), `iris`
# +- *FileScan csv [Sepal_Length#10,Sepal_Width#11,Petal_Length#12,Petal_Width#13,Species#14] Batched: false, Format: CSV, Location: InMemoryFileIndex[file:/var/folders/ry/_l__tbl57d940bk2kgj8q2nj3s_d9b/T/Rtmpjgtnl6/spark_serializ..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<Sepal_Length:double,Sepal_Width:double,Petal_Length:double,Petal_Width:double,Species:string>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2063 次 |
| 最近记录: |