我想将本地安装的 zeppelin 0.10.0 连接到本地安装的 Spark 3.2.0(我尝试了与 Spark2.3.0 相同的过程,并且成功了。)。但看起来齐柏林飞艇本身有一个内部火花,每次我尝试时都会使用内部火花。我已经完成了 Spark 解释器的设置,但没有用。我只是想知道是否可以更改 zeppelin 使用的默认内部 Spark,并将其更改为我想要使用的 Spark 3.2.0。
SPARK_HOME我输入了据说的参数并spark.master local[*]收到以下错误:
org.apache.zeppelin.interpreter.InterpreterException: java.lang.NoSuchMethodError: scala.tools.nsc.Settings.usejavacp()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:76)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:833)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:741)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)
at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: scala.tools.nsc.Settings.usejavacp()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;
at org.apache.zeppelin.spark.SparkScala212Interpreter.open(SparkScala212Interpreter.scala:66)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:121)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
... 8 more
org.apache.zeppelin.interpreter.InterpreterException: java.lang.NoSuchMethodError: scala.tools.nsc.Settings.usejavacp()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:76)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:833)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:741)
at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)
at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: scala.tools.nsc.Settings.usejavacp()Lscala/tools/nsc/settings/AbsSettings$AbsSetting;
at …Run Code Online (Sandbox Code Playgroud) 这是我尝试将数据框保存到文本时遇到的错误:
org.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 8 columns
Run Code Online (Sandbox Code Playgroud)
这是代码:
df.write.text("/tmp/wt")
Run Code Online (Sandbox Code Playgroud)
我做错了什么?
有什么方法可以将数据帧行转换为列。我有以下结构作为输入:
val inputDF = Seq(("pid1","enc1", "bat"),
("pid1","enc2", ""),
("pid1","enc3", ""),
("pid3","enc1", "cat"),
("pid3","enc2", "")
).toDF("MemberID", "EncounterID", "entry" )
inputDF.show:
+--------+-----------+-----+
|MemberID|EncounterID|entry|
+--------+-----------+-----+
| pid1| enc1| bat|
| pid1| enc2| |
| pid1| enc3| |
| pid3| enc1| cat|
| pid3| enc2| |
+--------+-----------+-----+
expected result:
+--------+----------+----------+----------+-----+
|MemberID|Encounter1|Encounter2|Encounter3|entry|
+--------+----------+----------+----------+-----+
| pid1| enc1| enc2| enc3| bat|
| pid3| enc1| enc2| null| cat|
+--------+----------+----------+----------+-----+
Run Code Online (Sandbox Code Playgroud)
请建议是否有任何优化的直接 API 可用于将行转换为列。我的输入数据量非常大,所以像收集这样的操作,我将无法执行,因为它会占用驱动程序上的所有数据。我正在使用 Spark 2.x
我有一个像这样的数据框
data = [(("ID1", {'A': 1, 'B': 2}))]
df = spark.createDataFrame(data, ["ID", "Coll"])
df.show()
+---+----------------+
| ID| Coll|
+---+----------------+
|ID1|[A -> 1, B -> 2]|
+---+----------------+
df.printSchema()
root
|-- ID: string (nullable = true)
|-- Coll: map (nullable = true)
| |-- key: string
| |-- value: long (valueContainsNull = true)
Run Code Online (Sandbox Code Playgroud)
我想分解“Coll”列,以便
+---+-----------+
| ID| Key| Value|
+---+-----------+
|ID1| A| 1|
|ID1| B| 2|
+---+-----------+
Run Code Online (Sandbox Code Playgroud)
我正在尝试在 pyspark 中执行此操作
如果我只使用一列,我就会成功,但我也想要 ID 列
df.select(explode("Coll").alias("x", "y")).show()
+---+---+
| x| y|
+---+---+
| A| …Run Code Online (Sandbox Code Playgroud) 我有这张桌子:
> equiposcount
MOVIL PILA PORTATIL
138 1 13
Run Code Online (Sandbox Code Playgroud)
并且我想创建一个如下字符串:
"138 MOVIL, 1 PILA, 13 PORTATIL"
Run Code Online (Sandbox Code Playgroud)
我在这里有点迷路,因为
> names(equiposcount)
[1] "MOVIL" "PILA" "PORTATIL"
Run Code Online (Sandbox Code Playgroud)
是字符类型,而不是向量。谁能帮我这个?提前致谢。
我是 Scala 和 Spark 的新手,我需要从数据框构建一个图表。这是我的数据框的结构,其中 S 和 O 是节点,列 P 表示边。
+---------------------------+---------------------+----------------------------+
|S |P |O |
+---------------------------+---------------------+----------------------------+
|http://website/Jimmy_Carter|http://web/name |James Earl Carter |
|http://website/Jimmy_Car |http://web/country |http://website/United_States|
|http://website/Jimmy_Car |http://web/birthPlace|http://web/Georgia_(US) |
+---------------------------+---------------------+----------------------------+
Run Code Online (Sandbox Code Playgroud)
这是数据框的代码,我想从数据框“dfA”创建一个图形
val test = sc
.textFile("testfile.ttl")
.map(_.split(" "))
.map(p => Triple(Try(p(0).toString()).toOption,
Try(p(1).toString()).toOption,
Try(p(2).toString()).toOption))
.toDF()
val url_regex = """^(?:"|<{1}\s?)(.*)(?:>(?:\s\.)?|,\s.*)$"""
val dfA = test
.withColumn("Subject", regexp_extract($"Subject", url_regex, 1))
.withColumn("Predicate", regexp_extract($"Predicate", url_regex, 1))
.withColumn("Object", regexp_extract($"Object", url_regex, 1))
Run Code Online (Sandbox Code Playgroud) 在 Scala 中,以下工作
1 max 2
但以下不
1 Math.pow 2
或者
import Math.pow
1 pow 2
Run Code Online (Sandbox Code Playgroud)
你能解释一下为什么吗?
我有循环创建任务的列表。该列表的大小是静态的。
for counter, account_id in enumerate(ACCOUNT_LIST):
task_id = f"bash_task_{counter}"
if account_id:
trigger_task = BashOperator(
task_id=task_id,
bash_command="echo hello there",
dag=dag)
else:
trigger_task = BashOperator(
task_id=task_id,
bash_command="echo hello there",
dag=dag)
trigger_task.status = SKIPPED # is there way to somehow set status of this to skipped instead of having a branch operator?
trigger_task
Run Code Online (Sandbox Code Playgroud)
我手动尝试过此操作,但无法跳过该任务:
start = DummyOperator(task_id='start')
task1 = DummyOperator(task_id='task_1')
task2 = DummyOperator(task_id='task_2')
task3 = DummyOperator(task_id='task_3')
task4 = DummyOperator(task_id='task_4')
start >> task1
start >> task2
try:
start >> task3
raise AirflowSkipException
except AirflowSkipException …Run Code Online (Sandbox Code Playgroud) 假设我有一个这样的数据框
val customer = Seq(
("C1", "Jackie Chan", 50, "Dayton", "M"),
("C2", "Harry Smith", 30, "Beavercreek", "M"),
("C3", "Ellen Smith", 28, "Beavercreek", "F"),
("C4", "John Chan", 26, "Dayton","M")
).toDF("cid","name","age","city","sex")
Run Code Online (Sandbox Code Playgroud)
我怎样才能在一列中获得 cid 值并array < struct < column_name, column_value > >在火花中获得其余的值
apache-spark ×6
scala ×6
airflow ×1
dataframe ×1
explode ×1
graph ×1
interpreter ×1
pyspark ×1
r ×1
spark-graphx ×1
transpose ×1