无法使用 databricks-connect“V2”V.13.2 访问 databricks 集群

zez*_*zar 4 python azure databricks azure-databricks databricks-connect

当尝试使用 databricks-connect 13.2.0 执行本地 Spark 代码时,它不起作用。

我有以下问题:

错误:

  • 详情="INVALID_STATE: cluster xxxxx is not Shared or Single User Cluster. (requestId=05bc3105-4828-46d4-a381-7580f3b55416)"
  • 调试错误字符串="UNKNOWN:Error received from peer {grpc_message:"INVALID_STATE: cluster 0711-122239-bb999j6u is not Shared or Single User Cluster. (requestId=05bc3105-4828-46d4-a381-7580f3b55416)", grpc_status:9, created_time:"2023-07-11T15:26:08.9729+02:00"}"

该集群是共享的,我尝试了几种集群配置,但它不起作用!集群运行时版本为13.2。

另外,我使用:

  • Python 3.10
  • openjdk版本“1.8.0_292”
  • Azure 数据块

有人对新的 databricks connect 遇到过类似的问题吗?

感谢帮助!

我尝试了以下代码:

from databricks.connect import DatabricksSession
from pyspark.sql.types import *

from delta.tables import DeltaTable
from datetime import date


if __name__ == "__main__":
    spark = DatabricksSession.builder.getOrCreate()

    # Create a Spark DataFrame consisting of high and low temperatures
    # by airport code and date.
    schema = StructType([
        StructField('AirportCode', StringType(), False),
        StructField('Date', DateType(), False),
        StructField('TempHighF', IntegerType(), False),
        StructField('TempLowF', IntegerType(), False)
    ])

    data = [
        [ 'BLI', date(2021, 4, 3), 52, 43],
        [ 'BLI', date(2021, 4, 2), 50, 38],
        [ 'BLI', date(2021, 4, 1), 52, 41],
        [ 'PDX', date(2021, 4, 3), 64, 45],
        [ 'PDX', date(2021, 4, 2), 61, 41],
        [ 'PDX', date(2021, 4, 1), 66, 39],
        [ 'SEA', date(2021, 4, 3), 57, 43],
        [ 'SEA', date(2021, 4, 2), 54, 39],
        [ 'SEA', date(2021, 4, 1), 56, 41]
    ]

    temps = spark.createDataFrame(data, schema)

    print(temps)
Run Code Online (Sandbox Code Playgroud)

我希望通过远程 Spark 执行在本地终端中显示数据帧

Ale*_*Ott 5

Databricks Connect V2 需要支持 Unity Catalog 的集群 - 它在要求中明确指出。看起来您正在使用数据访问模式“无隔离共享”,或者您根本没有 Unity Catalog。如果您有 Unity Catalog,请确保您已选择Single UserShared在“访问模式”下拉列表中。

在此输入图像描述