我试图在Windows Server 2012中安装Spark 2.0.1来测试Zeppelin 0.6.2.
我启动了Spark master并测试了Spark Shell.然后我在conf\zeppeling-env.cmd文件中配置了以下内容:
set SPARK_HOME=C:\spark-2.0.1-bin-hadoop2.7
set MASTER=spark://100.79.240.26:7077
我没有设置HADOOP_CONF_DIR和SPARK_SUBMIT_OPTIONS(根据文档可选)
我检查了Interpreter配置页面中的值,并且spark master是Ok.
当我运行Zeppelin教程 - >"将数据加载到表中"时注意我收到连接拒绝错误.以下是错误日志中消息的一部分:
INFO [2016-11-17 21:58:12,518]({pool-1-thread-11} Paragraph.java [jobRun]:252) - 使用null org.apache.zeppelin.interpreter.LazyOpenInterpreter @运行段落20150210-015259_1403135953 8bbfd7 INFO [2016-11-17 21:58:12,518]({pool-1-thread-11} RemoteInterpreterProcess.java [reference]:148) - 运行解释器进程[C:\ zeppelin-0.6.2-bin-all\bin\interpreter.cmd,-d,C:\ zeppelin-0.6.2-bin-all\interpreter\spark,-p,50163,-l,C:\ zeppelin-0.6.2-bin-all/local- repo/2C3FBS414] INFO [2016-11-17 21:58:12,614]({Exec Default Executor} RemoteInterpreterProcess.java [onProcessFailed]:288) - 解释器进程失败{} org.apache.commons.exec.ExecuteException:进程已退出在org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)org.apache.commons.exec.DefaultExecutor.access $ 200(DefaultExecutor.java:48)中出现错误:255(退出值:255) at org.apache.commons.exec.DefaultExecutor $ 1.run(DefaultExecutor.java:200)at java.lang.Thread.run(Thread.java:745)ERROR [2016 -11-17 21:58:43,846]({Thread-49} RemoteScheduler.java [getStatus]:255) - 无法获取状态信息org.apache.zeppelin.interpreter.InterpreterException:org.apache.thrift.transport. TTransportException:java.net.ConnectException:连接被拒绝:连接org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:53)org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory. java:37)org.apache.com上的org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60)org.apache.com上的org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861)位于org.apache.com上的Org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)中的.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435).remote.RemoteInterpreterProcess.getClient. (RemoteInterpreterProcess.java:189)org.apache.zeppelin.scheduler.RemoteScheduler $ JobStatusPoller.getStatus(RemoteScheduler).java:253)at org.apache.zeppelin.scheduler.RemoteScheduler $ JobStatusPoller.run(RemoteScheduler.java:211)引起:org.apache.thrift.transport.TTransportException:java.net.ConnectException:连接被拒绝:连接at org.apache.thrift.transport.TSocket.open(TSocket.java:187)org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)... 8更多引起:java.net .ConnectException:连接被拒绝:在java.net.DualStackPlainSocketImpl.smplConnect(DualStackPlainSocketImpl.java:79)的java.net.AualStackPlainSocketImpl.connect(JavaP.AbstractStackSocketImpl.Lode:7)处的java.net.DualStackPlainSocketImpl.connect0(本地方法)处连接java.net.SocksSocketImpl的java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)上的java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) java.net.Socket.connect中的.connect(SocksSocketImpl.java:392)(Socket.java:579)at org.apache.thrift.transport.TSocket.open(TSocket.java:182)... 9 more ERROR [2016-11-17 …
我有以下情况:我想将 Anaconda3 与 Zeppelin 和 Spark 一起使用。
\n\n我已经安装了以下组件:
\n\n基本上,我将 Python 解释器配置为指向我的 anaconda 版本,在我的例子中是 /opt/anaconda3/bin/python 并且这是有效的。我还使用以下命令编辑了 zeppelin.sh 脚本:
\n\nexport PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip"\nexport SPARK_YARN_USER_ENV="PYTHONPATH=${PYTHONPATH}"\nexport PYSPARK_DRIVER_PYTHON="/var/opt/teradata/anaconda3/envs/py35/bin/ipython"\nexport PYSPARK_PYTHON="/var/opt/teradata/anaconda3/envs/py35/bin/python"\nexport PYLIB="/var/opt/teradata/anaconda3/envs/py35/lib"\nRun Code Online (Sandbox Code Playgroud)\n\n到这里一切都好。
\n\n当我尝试%python.conda和%python.sql解释器时,它们失败了,因为找不到 conda 命令,并且 pandas 也没有找到。\n我将库位置添加到$PATH环境变量中,Zeppelin 能够找到这些库命令,但副作用是,整个环境的默认 Python 版本变成3.5而不是2.7,我开始收到另一个像这样的好错误:
\n\n …