我安装了Zeppelin 0.7.1.当我尝试执行示例spark程序(可用于Zeppelin Tutorial笔记本)时,我收到以下错误
java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:391)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:380)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:828)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Run Code Online (Sandbox Code Playgroud)
我还设置了配置文件(zeppelin-env.sh)以指向我的Spark安装和Hadoop配置目录
export SPARK_HOME="/${homedir}/sk"
export HADOOP_CONF_DIR="/${homedir}/hp/etc/hadoop"
Run Code Online (Sandbox Code Playgroud)
我使用的Spark版本是2.1.0,Hadoop是2.7.3
此外,我使用默认的Spark解释器配置(因此Spark设置为运行Local mode)
我在这里错过了什么吗?
PS:我可以使用终端连接火花 spark-shell
Apache Zeppelin在其笔记本电脑中是否具有intellisense/auto-completion支持?
如果是这样我该如何使用它?
我尝试在互联网上搜索,但找不到有效的来源是不成功的.这个https://github.com/NFLabs/zeppelin/issues/248说要使用ctrl-,但它不起作用.我想知道在项目移动到Apache之后是否删除了intellisense支持.
我正在尝试通过将其与d3.js集成来为Apache Zeppelin添加更多可视化选项
我找到了一个例子,其中有人用leaflet.js 在这里做了,并尝试做类似的事情 - 不幸的是我不太熟悉angularJS(Zeppelin用来解释前端语言).我也没有流数据.下面是我的代码,只使用d3.js中的一个简单的教程示例
%angular
<div>
<svg class="chart"></svg>
</div>
<script>
function useD3() {
var data = [4, 8, 15, 16, 23, 42];
var width = 420,
barHeight = 20;
var x = d3.scale.linear()
.domain([0, d3.max(data)])
.range([0, width]);
var chart = d3.select(".chart")
.attr("width", width)
.attr("height", barHeight * data.length);
var bar = chart.selectAll("g")
.data(data)
.enter().append("g")
.attr("transform", function(d, i) { return "translate(0," + i * barHeight + ")"; });
bar.append("rect")
.attr("width", x)
.attr("height", barHeight - 1);
}
if (window.d3) { …Run Code Online (Sandbox Code Playgroud) 我刚刚安装了apache zeppelin(从git repo的最新源代码构建)并成功地看到它在端口10008中启动并运行.我用一行代码创建了一个新的笔记本
val a = "Hello World!"
Run Code Online (Sandbox Code Playgroud)
并运行此段并看到以下错误
java.net.ConnectException:连接被拒绝在java.net.PlainSocketImpl.socketConnect(本机方法)在java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)在java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)在java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)在java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)在java.net.Socket.connect(Socket.java:589)在org.apache. thrift.transport.TSocket.open(TSocket.java:182)在org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51)在org.apache.zeppelin.interpreter.remote.ClientFactory.create( ClientFactory.java:37)位于org.apache.commons.pool2.impl的org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60).GenericObjectPool.create(GenericObjectPool.java:861)在org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)在org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java: 363)在org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139)在org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:137)在org.apache.zeppelin位于org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java)的org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104)中的.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:257) :197)org.apache.zeppelin.scheduler.Job.run(Job.java:170)at org.apache.zeppelin.scheduler.RemoteScheduler $ JobRunner.运行(RemoteScheduler.java:304)在java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511)在java.util.concurrent.FutureTask.run(FutureTask.java:266)在java.util.concurrent中.ScheduledThreadPoolExecutor $ ScheduledFutureTask.access $ 201(ScheduledThreadPoolExecutor.java:180)java.util.concurrent.ScheduledThreadPoolExecutor $ ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745)ScheduledThreadPoolExecutor $ ScheduledFutureTask.access $ 201(ScheduledThreadPoolExecutor.java:180)java.util.concurrent.ScheduledThreadPoolExecutor $ ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745ScheduledThreadPoolExecutor $ ScheduledFutureTask.access $ 201(ScheduledThreadPoolExecutor.java:180)java.util.concurrent.ScheduledThreadPoolExecutor $ ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745
任何线索?
我的后端是火花1.5,我通过解释器的网络界面验证,齐柏林飞艇指向正确版本的火花并适应spark.home.
如果我有一个带有DataFrame的Scala段落,我可以与python共享和使用它.(据我所知,pyspark使用py4j)
我试过这个:
斯卡拉段落:
x.printSchema
z.put("xtable", x )
Run Code Online (Sandbox Code Playgroud)
Python段落:
%pyspark
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
the_data = z.get("xtable")
print the_data
sns.set()
g = sns.PairGrid(data=the_data,
x_vars=dependent_var,
y_vars=sensor_measure_columns_names + operational_settings_columns_names,
hue="UnitNumber", size=3, aspect=2.5)
g = g.map(plt.plot, alpha=0.5)
g = g.set(xlim=(300,0))
g = g.add_legend()
Run Code Online (Sandbox Code Playgroud)
错误:
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark.py", line 222, in <module>
eval(compiledCode)
File "<string>", line 15, in <module>
File "/usr/local/lib/python2.7/dist-packages/seaborn/axisgrid.py", line 1223, in __init__
hue_names = …Run Code Online (Sandbox Code Playgroud) 我试图用Zeppelin在Spark ML中建立一个模型.我是这个领域的新手,想要一些帮助.我想我需要将正确的数据类型设置为列并将第一列设置为标签.非常感谢任何帮助,谢谢
val training = sc.textFile("hdfs:///ford/fordTrain.csv")
val header = training.first
val inferSchema = true
val df = training.toDF
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
val lrModel = lr.fit(df)
// Print the coefficients and intercept for multinomial logistic regression
println(s"Coefficients: \n${lrModel.coefficientMatrix}")
println(s"Intercepts: ${lrModel.interceptVector}")
Run Code Online (Sandbox Code Playgroud)
我正在使用的csv文件的片段是:
IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2
0,34.7406,9.84593,1400,42.8571,0.290601,572,104.895,0,0,0,
Run Code Online (Sandbox Code Playgroud) 我使用的是HDP-2.6.0.3,但我需要Zeppelin 0.8,所以我已将其作为独立服务安装.当我跑:
%sql
show tables
Run Code Online (Sandbox Code Playgroud)
我什么都没回来,当我运行Spark2 SQL命令时,我得到'table not found'.表格可以在0.7 Zeppelin中看到,它是HDP的一部分.
任何人都可以告诉我我失踪了什么,让Zeppelin/Spark看到Hive?
我为创建zep0.8而执行的步骤如下:
maven clean package -DskipTests -Pspark-2.1 -Phadoop-2.7-Dhadoop.version=2.7.3 -Pyarn -Ppyspark -Psparkr -Pr -Pscala-2.11
Run Code Online (Sandbox Code Playgroud)
将/usr/hdp/2.6.0.3-8/zeppelin/conf中的zeppelin-site.xml和shiro.ini复制到/ home/ed/zeppelin/conf.
创建了/home/ed/zeppelin/conf/zeppeli-env.sh,其中我提出了以下内容:
export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export HADOOP_CONF_DIR=/etc/hadoop/conf
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.6.0.3-8"
Run Code Online (Sandbox Code Playgroud)
将/etc/hive/conf/hive-site.xml复制到/ home/ed/zeppelin/conf
编辑:我也尝试过:
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("interfacing spark sql to hive metastore without configuration file")
.config("hive.metastore.uris", "thrift://s2.royble.co.uk:9083") // replace with your hivemetastore service's thrift url
.config("url", "jdbc:hive2://s2.royble.co.uk:10000/default")
.config("UID", "admin")
.config("PWD", "admin")
.config("driver", "org.apache.hive.jdbc.HiveDriver")
.enableHiveSupport() // don't forget to enable hive support
.getOrCreate() …Run Code Online (Sandbox Code Playgroud) 当我在apache-zeppelin中执行此查询时,我得到的结果只有100个'结果受限于100'.信息.
%sql
SELECT ip
FROM log
Run Code Online (Sandbox Code Playgroud)
所以我在SQL查询中附加了'Limit 10000',但它只返回100个结果.
%sql
SELECT ip
FROM log
LIMIT 10000
Run Code Online (Sandbox Code Playgroud)
那么,如何在zeppelin中获得超过100的sql结果?
正如我们经常听到的apache zeppelin那样,我们脑海中浮现的问题很少:
apache-zeppelin ×10
apache-spark ×5
bigdata ×1
d3.js ×1
hive ×1
intellisense ×1
javascript ×1
pyspark ×1
python ×1
scala ×1