mts*_*tsz 10 proxy dependencies ivy apache-spark
我想在公司代理后面使用外部包运行spark-shell.不幸的是,通过--packages选项传递的外部包没有解决.
例如,跑步时
bin/spark-shell --packages datastax:spark-cassandra-connector:1.5.0-s_2.10
Run Code Online (Sandbox Code Playgroud)
cassandra连接器包未解析(卡在最后一行):
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
Run Code Online (Sandbox Code Playgroud)
一段时间后,连接超时,包含如下错误消息:
:::: ERRORS
Server access error at url https://repo1.maven.org/maven2/datastax/spark-cassandra-connector/1.5.0-s_2.10/spark-cassandra-connector-1.5.0-s_2.10.pom (java.net.ConnectException: Connection timed out)
Run Code Online (Sandbox Code Playgroud)
当我使用公司代理停用VPN时,包将立即解析并下载.
到目前为止我尝试了什么:
将代理公开为环境变量:
export http_proxy=<proxyHost>:<proxyPort>
export https_proxy=<proxyHost>:<proxyPort>
export JAVA_OPTS="-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>"
export ANT_OPTS="-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>"
Run Code Online (Sandbox Code Playgroud)
使用额外的java选项运行spark-shell:
bin/spark-shell --conf "spark.driver.extraJavaOptions=-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>" --conf "spark.executor.extraJavaOptions=-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>" --packages datastax:spark-cassandra-connector:1.6.0-M1-s_2.10
Run Code Online (Sandbox Code Playgroud)
是否有其他配置可能性我缺少?
mts*_*tsz 19
找到了正确的设置:
bin/spark-shell --conf "spark.driver.extraJavaOptions=-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort> -Dhttps.proxyHost=<proxyHost> -Dhttps.proxyPort=<proxyPort>" --packages <somePackage>
Run Code Online (Sandbox Code Playgroud)
http和https代理都必须设置为额外的驱动程序选项.JAVA_OPTS似乎没有做任何事情.
如果在您的操作系统上正确配置了代理,您可以使用 java 属性java.net.useSystemProxies:
--conf "spark.driver.extraJavaOptions=-Djava.net.useSystemProxies=true"
因此将配置代理主机/端口和无代理主机。
这在 spark 1.6.1 中对我有用:
bin\spark-shell --driver-java-options "-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort> -Dhttps.proxyHost=<proxyHost> -Dhttps.proxyPort=<proxyPort>" --packages <package>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
11081 次 |
| 最近记录: |