我们有一个基于 pyspark 的应用程序,我们正在执行 Spark 提交,如下所示。应用程序正在按预期工作,但是我们看到一条奇怪的警告消息。有什么办法可以处理这个问题或者为什么会出现这种情况?
注意:该群集是 Azure HDI 群集。
spark-submit --master yarn --deploy-mode cluster --jars file:/<localpath>/* --py-files pyFiles/__init__.py,pyFiles/<abc>.py,pyFiles/<abd>.py --files files/<env>.properties,files/<config>.json main.py
Run Code Online (Sandbox Code Playgroud)
看到的警告是:
warnings.warn( /usr/hdp/current/spark3-client/python/pyspark/context.py:256: RuntimeWarning: 无法添加文件 [file:///home/sshuser/project/pyFiles/abc.py] 指定在“spark.submit.pyFiles”中到Python路径:
/mnt/resource/hadoop/yarn/local/usercache/sshuser/filecache/929
以上警告针对所有文件,即 abc.py、abd.py 等(传递给 --py-files 的)
我尝试安装 pyarrow,但由于以下错误而失败。我也尝试了 --no-binary :all: 选项,但仍然是同样的问题。任何帮助解决这个问题都会真正帮助我。
Python 版本:3.7 Linux 版本:python:3.7-alpine 下面是安装的堆栈跟踪。
**sudo pip install pyarrow --proxy=x.x.x.x.**
*Looking in indexes: https://x.x.x.x/api/pypi/python/simple/
Collecting pyarrow
Downloading https://repo.lab.pl.alcatel-lucent.com/api/pypi/python/packages/packages/fd/b7/78115614c4b227796cc87fff907930f6ae6dd999c5000d3d6ae5c2e54582/pyarrow-2.0.0.tar.gz (58.9 MB)
|????????????????????????????????| 58.9 MB 55 kB/s
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Requirement already satisfied: numpy>=1.14 in /usr/local/lib/python3.7/site-packages (from pyarrow) (1.19.4)
Building wheels for collected packages: pyarrow
Building wheel for pyarrow (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python …Run Code Online (Sandbox Code Playgroud)