如何使用“spark.catalog.createTable”函数创建分区表？

Question

有一个选项参数，但我没有找到任何使用它来传递分区列的示例

Answer 1

我相信如果您不提供架构，则不需要指定分区列。在这种情况下，spark 会自动从该位置推断架构和分区。然而，当前的实现不可能同时提供模式和分区，但幸运的是，底层实现的所有代码都是开放的，因此我完成了创建外部 Hive 表的下一个方法。

  private def createExternalTable(tableName: String, location: String, 
      schema: StructType, partitionCols: Seq[String], source: String): Unit = {
    val tableIdent = TableIdentifier(tableName)
    val storage = DataSource.buildStorageFormatFromOptions(Map("path" -> location))
    val tableDesc = CatalogTable(
      identifier = tableIdent,
      tableType = CatalogTableType.EXTERNAL,
      storage = storage,
      schema = schema,
      partitionColumnNames = partitionCols,
      provider = Some(source)
    )
    val plan = CreateTable(tableDesc, SaveMode.ErrorIfExists, None)
    spark.sessionState.executePlan(plan).toRdd  
  }

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，1 月前
查看次数：	3476 次
最近记录：	6 年，3 月前