spark提交中的PySpark依赖模块

dat*_*ict 5 apache-spark boto3 pyspark spark-submit

我正在尝试运行 spark submit(pyspark) 命令。作为 spark 提交的一部分,我需要提供 boto3 的依赖项,因为它是我代码中的依赖项。我正在运行以下命令并且没有收到模块错误。

bin/spark-submit --master=local --py-files /home/user/boto3-develop.zip /home/user/py_script.py

Traceback (most recent call last):
  File "/home/user/py_script.py", line 16, in <module>
    import boto3
ModuleNotFoundError: No module named 'boto3'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "/home/user/py_script.py", line 16, in <module>
    import boto3
ModuleNotFoundError: No module named 'boto3'
Run Code Online (Sandbox Code Playgroud)

不知道我哪里出错了。