zez*_*zar 4 python azure databricks azure-databricks databricks-connect
当尝试使用 databricks-connect 13.2.0 执行本地 Spark 代码时,它不起作用。
我有以下问题:
错误:
"INVALID_STATE: cluster xxxxx is not Shared or Single User Cluster. (requestId=05bc3105-4828-46d4-a381-7580f3b55416)""UNKNOWN:Error received from peer {grpc_message:"INVALID_STATE: cluster 0711-122239-bb999j6u is not Shared or Single User Cluster. (requestId=05bc3105-4828-46d4-a381-7580f3b55416)", grpc_status:9, created_time:"2023-07-11T15:26:08.9729+02:00"}"该集群是共享的,我尝试了几种集群配置,但它不起作用!集群运行时版本为13.2。
另外,我使用:
有人对新的 databricks connect 遇到过类似的问题吗?
感谢帮助!
我尝试了以下代码:
from databricks.connect import DatabricksSession
from pyspark.sql.types import *
from delta.tables import DeltaTable
from datetime import date
if __name__ == "__main__":
spark = DatabricksSession.builder.getOrCreate()
# Create a Spark DataFrame consisting of high and low temperatures
# by airport code and date.
schema = StructType([
StructField('AirportCode', StringType(), False),
StructField('Date', DateType(), False),
StructField('TempHighF', IntegerType(), False),
StructField('TempLowF', IntegerType(), False)
])
data = [
[ 'BLI', date(2021, 4, 3), 52, 43],
[ 'BLI', date(2021, 4, 2), 50, 38],
[ 'BLI', date(2021, 4, 1), 52, 41],
[ 'PDX', date(2021, 4, 3), 64, 45],
[ 'PDX', date(2021, 4, 2), 61, 41],
[ 'PDX', date(2021, 4, 1), 66, 39],
[ 'SEA', date(2021, 4, 3), 57, 43],
[ 'SEA', date(2021, 4, 2), 54, 39],
[ 'SEA', date(2021, 4, 1), 56, 41]
]
temps = spark.createDataFrame(data, schema)
print(temps)
Run Code Online (Sandbox Code Playgroud)
我希望通过远程 Spark 执行在本地终端中显示数据帧
| 归档时间: |
|
| 查看次数: |
2023 次 |
| 最近记录: |