Jar*_*cho 5 emr apache-spark pyspark
We've started a new spark cluster on EMR that is running Spark 2.3.0, and are trying to run the same command as on a cluster running Spark 2.2.0, but we are getting the traceback: java.io.IOException: Cannot run program "./venv/bin/python": error=2, No such file or directory
.
The command that we are running is:
PYSPARK_PYTHON=./venv/bin/python PYSPARK_DRIVER_PYTHON=python
$SPARK_HOME/bin/spark-submit --py-files=dist/project_main-1.0.0-
py2.7.egg --master=yarn --deploy-mode=client --archives=venv.zip#venv -
-packages org.apache.derby:derbytools:10.14.1.0,org.apache.derby:derbyclient:10.14.1.0,com.github.databricks:spark-avro:204864b6cf,com.databricks:spark-redshift_2.11:3.0.0-preview1,com.databricks:spark-csv_2.11:1.5.0,com.amazon.redshift:redshift-jdbc42:1.2.12.1017 --repositories https://jitpack.io,http://redshift-maven-repository.s3-website-us-east-1.amazonaws.com/release --executor-memory 4g project_main/main.py
Run Code Online (Sandbox Code Playgroud)
We've ensured that the virtualenv is relocatable, and have tried different combinations of PYSPARK_PYTHON
, --archives
, and --files
within the spark submit command. We tried omitting PYSPARK_PYTHON
, but it was not running inside the virtualenv, and were missing our packages and libraries.
Has the behavior of PYSPARK_PYTHON
or PYSPARK_DRIVER_PYTHON
or --archives
changed from 2.2 to 2.3?
归档时间: |
|
查看次数: |
533 次 |
最近记录: |