本地spark会话中的Spark URL无效

Lor*_*uer 11 apache-spark

自从更新到Spark 2.3.0后,在我的CI(信号量)中运行的测试由于在创建(本地)spark上下文时涉嫌无效的spark url而失败:

18/03/07 03:07:11 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver@LXC_trusty_1802-d57a40eb:44610
    at org.apache.spark.rpc.RpcEndpointAddress$.apply(RpcEndpointAddress.scala:66)
    at org.apache.spark.rpc.netty.NettyRpcEnv.asyncSetupEndpointRefByURI(NettyRpcEnv.scala:134)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
    at org.apache.spark.util.RpcUtils$.makeDriverRef(RpcUtils.scala:32)
    at org.apache.spark.executor.Executor.<init>(Executor.scala:155)
    at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:59)
    at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:126)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
Run Code Online (Sandbox Code Playgroud)

spark会话创建如下:

val sparkSession: SparkSession = SparkSession
.builder
.appName(s"LocalTestSparkSession")
.config("spark.broadcast.compress", "false")
.config("spark.shuffle.compress", "false")
.config("spark.shuffle.spill.compress", "false")
.master("local[3]")
.getOrCreate
Run Code Online (Sandbox Code Playgroud)

在更新到Spark 2.3.0之前,版本2.2.1和2.1.0中没有遇到任何问题.此外,在本地运行测试工作正常.

Pra*_*rai 15

更改SPARK_LOCAL_HOSTNAMElocalhost并尝试.

export SPARK_LOCAL_HOSTNAME=localhost
Run Code Online (Sandbox Code Playgroud)

  • 在 windows 平台上你必须使用 **SET SPARK_LOCAL_HOSTNAME=localhost** (2认同)

小智 9

如果您不想更改环境变量,可以更改代码以在 SparkSession 构建器中添加配置(就像 Hanisha 上面所说的那样)。

在 PySpark 中:

spark = SparkSession.builder.config("spark.driver.host", "localhost").getOrCreate()
Run Code Online (Sandbox Code Playgroud)


Nag*_*sha 8

这已通过将sparkSession config"spark.driver.host"设置为IP地址来解决.

似乎从2.3开始需要进行此更改.