在群集模式下使用spark-submit共享配置文件

Question

在群集模式下使用spark-submit共享配置文件

Che*_*eko 6 hadoop-yarn apache-spark spark-streaming

在开发期间,我一直在"客户端"模式下运行我的火花作业.我使用"--file"与执行程序共享配置文件.Driver正在本地读取配置文件.现在我想以"集群"模式部署作业.我现在很难与驱动程序共享配置文件.

例如,我将配置文件名称作为extraJavaOptions传递给驱动程序和执行程序.我正在使用SparkFiles.get()读取文件

  val configFile = org.apache.spark.SparkFiles.get(System.getProperty("config.file.name"))

Run Code Online (Sandbox Code Playgroud)

这在执行程序上运行良好但在驱动程序上失败.我认为文件只与执行程序共享,而不是与运行驱动程序的容器共享.一种选择是将配置文件保存在S3中.我想检查是否可以使用spark-submit实现这一点.

> spark-submit --deploy-mode cluster --master yarn --driver-cores 2
> --driver-memory 4g --num-executors 4 --executor-cores 4 --executor-memory 10g \
> --files /home/hadoop/Streaming.conf,/home/hadoop/log4j.properties \
> --conf **spark.driver.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --conf **spark.executor.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --class ....

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 5

我在这个线程中找到了这个问题的解决方案。

您可以通过在末尾添加“#alias”来为通过 --files 提交的文件指定别名。通过这个技巧，您应该能够通过文件的别名访问这些文件。

例如，下面的代码可以运行而不会出现错误。

spark-submit --master yarn-cluster --files test.conf#testFile.conf test.py

Run Code Online (Sandbox Code Playgroud)

test.py 为：

path_f = 'testFile.conf'
try:
    f = open(path_f, 'r')
except:
    raise Exception('File not opened', 'EEEEEEE!')

Run Code Online (Sandbox Code Playgroud)

和一个空的 test.conf

Answer 2

Sha*_*kar 4

您需要尝试--properties-fileSpark 提交命令中的选项。

例如属性文件内容

spark.key1=value1
spark.key2=value2

Run Code Online (Sandbox Code Playgroud)

所有的钥匙都需要prefixed与spark.

然后使用像这样的spark-submit命令来传递属性文件。

bin/spark-submit --properties-file  propertiesfile.properties

Run Code Online (Sandbox Code Playgroud)

然后在代码中您可以使用下面的 SparkContext 方法获取密钥getConf。

sc.getConf.get("spark.key1")  // returns value1

Run Code Online (Sandbox Code Playgroud)

一旦获得键值，您就可以在任何地方使用它。

感谢您的回复！我已经有另一种格式的配置文件（类型安全配置）。保持配置组织有序而不是将它们作为 KV 对放在文件中有几个优点。有没有办法让我的配置文件也与驱动程序共享？ (2认同)

归档时间：	8 年，10 月前
查看次数：	4423 次
最近记录：	7 年，4 月前