小编r_g*_*_s_的帖子

带有 PEX 的 Databricks PySpark:如何使用 PEX 作为依赖项在 Databricks 上配置 PySpark 作业?

我尝试使用下面的 Spark-submit 参数(依赖项位于 PEX 文件上)通过 Databricks UI(使用 Spark-submit)创建 PySpark 作业,但出现 PEX 文件不存在的异常。据我了解, --files 选项将文件放入驱动程序和每个执行程序的工作目录中,所以我很困惑为什么会遇到这个问题。

配置

[
"--files","s3://some_path/my_pex.pex",
"--conf","spark.pyspark.python=./my_pex.pex",
"s3://some_path/main.py",
"--some_arg","2022-08-01"
]
Run Code Online (Sandbox Code Playgroud)

标准误

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Warning: Ignoring non-Spark config property: libraryDownload.sleepIntervalSeconds
Warning: Ignoring non-Spark config property: libraryDownload.timeoutSeconds
Warning: Ignoring non-Spark config property: eventLog.rolloverIntervalSeconds
Exception in thread "main" java.io.IOException: Cannot run program "./my_pex.pex": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
    at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97)
    at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at …
Run Code Online (Sandbox Code Playgroud)

pex apache-spark pyspark databricks spark-submit

5
推荐指数
0
解决办法
413
查看次数

标签 统计

apache-spark ×1

databricks ×1

pex ×1

pyspark ×1

spark-submit ×1