Exp*_*rer 5 amazon-emr conda apache-zeppelin
我正在尝试在 EMR 上安装 conda,下面是我的引导脚本,看起来 conda 已安装,但未添加到环境变量中。当我手动更新$PATHEMR 主节点上的变量时,它可以识别conda. 我想在 Zeppelin 上使用 conda。
我还尝试在启动 EMR 实例时将 condig 添加到配置中,如下所示,但我仍然收到下面提到的错误。
"classification": "spark-env",
"properties": {
"conda": "/home/hadoop/conda/bin"
}
Run Code Online (Sandbox Code Playgroud)
[hadoop@ip-172-30-5-150 ~]$ PATH=/home/hadoop/conda/bin:$PATH
[hadoop@ip-172-30-5-150 ~]$ conda
usage: conda [-h] [-V] command ...
conda is a tool for managing and deploying applications, environments and packages.
Run Code Online (Sandbox Code Playgroud)
#!/usr/bin/env bash
# Install conda
wget https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh -O /home/hadoop/miniconda.sh \
&& /bin/bash ~/miniconda.sh -b -p $HOME/conda
conda config --set always_yes yes --set changeps1 no
conda install conda=4.2.13
conda config -f --add channels conda-forge
rm ~/miniconda.sh
echo bootstrap_conda.sh completed. PATH now: $PATH
export PYSPARK_PYTHON="/home/hadoop/conda/bin/python3.5"
echo -e '\nexport PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc
conda create -n zoo python=3.7 # "zoo" is conda environment name, you can use any name you like.
conda activate zoo
sudo pip3 install tensorflow
sudo pip3 install boto3
sudo pip3 install botocore
sudo pip3 install numpy
sudo pip3 install pandas
sudo pip3 install scipy
sudo pip3 install s3fs
sudo pip3 install matplotlib
sudo pip3 install -U tqdm
sudo pip3 install -U scikit-learn
sudo pip3 install -U scikit-multilearn
sudo pip3 install xlutils
sudo pip3 install natsort
sudo pip3 install pydot
sudo pip3 install python-pydot
sudo pip3 install python-pydot-ng
sudo pip3 install pydotplus
sudo pip3 install h5py
sudo pip3 install graphviz
sudo pip3 install recmetrics
sudo pip3 install openpyxl
sudo pip3 install xlrd
sudo pip3 install xlwt
sudo pip3 install tensorflow.io
sudo pip3 install Cython
sudo pip3 install ray
sudo pip3 install zoo
sudo pip3 install analytics-zoo
sudo pip3 install analytics-zoo[ray]
#sudo /usr/bin/pip-3.6 install -U imbalanced-learn
Run Code Online (Sandbox Code Playgroud)
我通过修改脚本使 conda 工作如下,emr python 版本与 conda 版本发生冲突:
\nwget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.9.2-Linux-x86_64.sh -O /home/hadoop/miniconda.sh \\\n && /bin/bash ~/miniconda.sh -b -p $HOME/conda\n\necho -e '\\n export PATH=$HOME/conda/bin:$PATH' >> $HOME/.bashrc && source $HOME/.bashrc\n\n\nconda config --set always_yes yes --set changeps1 no\nconda config -f --add channels conda-forge\n\n\nconda create -n zoo python=3.7 # "zoo" is conda environment name\nconda init bash\nsource activate zoo\nconda install python 3.7.0 -c conda-forge orca \nsudo /home/hadoop/conda/envs/zoo/bin/python3.7 -m pip install virtualenv\nRun Code Online (Sandbox Code Playgroud)\n并将 zeppelin python 和 pyspark 参数设置为:
\n\xe2\x80\x9cspark.pyspark.python": "/home/hadoop/conda/envs/zoo/bin/python3",\n"spark.pyspark.virtualenv.enabled": "true",\n"spark.pyspark.virtualenv.type":"native",\n"spark.pyspark.virtualenv.bin.path":"/home/hadoop/conda/envs/zoo/bin/,\n"zeppelin.pyspark.python" : "/home/hadoop/conda/bin/python",\n"zeppelin.python": "/home/hadoop/conda/bin/python"\nRun Code Online (Sandbox Code Playgroud)\nOrca 仅支持 TF 最高 1.5,因此它无法工作,因为我使用的是 TF2。
\n| 归档时间: |
|
| 查看次数: |
769 次 |
| 最近记录: |