如何在Spark Notebook中导入库

Cur*_*ong 8 scala magellan apache-spark spark-notebook

我在使用magellan-1.0.4-s_2.11spark笔记本电脑时遇到了麻烦.我从网上下载JAR https://spark-packages.org/package/harsha2010/magellan并试图放置SPARK_HOME/bin/spark-shell --packages harsha2010:magellan:1.0.4-s_2.11Start of Customized Settingsbin文件夹的火花笔记本文件的部分.

这是我的进口

import magellan.{Point, Polygon, PolyLine}
import magellan.coord.NAD83
import org.apache.spark.sql.magellan.MagellanContext
import org.apache.spark.sql.magellan.dsl.expressions._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
Run Code Online (Sandbox Code Playgroud)

而我的错误......

<console>:71: error: object Point is not a member of package org.apache.spark.sql.magellan
       import magellan.{Point, Polygon, PolyLine}
              ^
<console>:72: error: object coord is not a member of package org.apache.spark.sql.magellan
       import magellan.coord.NAD83
                       ^
<console>:73: error: object MagellanContext is not a member of package org.apache.spark.sql.magellan
       import org.apache.spark.sql.magellan.MagellanContext
Run Code Online (Sandbox Code Playgroud)

然后,我尝试通过将其放入类似的任何其他库来导入新库main script:

$lib_dir/magellan-1.0.4-s_2.11.jar"
Run Code Online (Sandbox Code Playgroud)

这不起作用,我一直在挠头,想知道我做错了什么.如何将magellan等库导入spark笔记本?

Mat*_*zok 1

尝试评估类似的东西

:dp "harsha2010" % "magellan" % "1.0.4-s_2.11"
Run Code Online (Sandbox Code Playgroud)

它将把库加载到 Spark 中,允许对其进行import编辑 - 假设可以通过 Maven 存储库获取它。就我而言,它失败并显示一条消息:

failed to load 'harsha2010:magellan:jar:1.0.4-s_2.11 (runtime)' from ["Maven2 local (file:/home/dev/.m2/repository/, releases+snapshots) without authentication", "maven-central (http://repo1.maven.org/maven2/, releases+snapshots) without authentication", "spark-packages (http://dl.bintray.com/spark-packages/maven/, releases+snapshots) without authentication", "oss-sonatype (https://oss.sonatype.org/content/repositories/releases/, releases+snapshots) without authentication"] into /tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786
Run Code Online (Sandbox Code Playgroud)

我认为文件太大,在下载整个文件之前连接已中断。

解决方法

所以我从以下位置手动下载了 JAR:

http://dl.bintray.com/spark-packages/maven/harsha2010/magellan/1.0.4-s_2.11/
Run Code Online (Sandbox Code Playgroud)

并将其复制到:

/tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786/harsha2010/magellan/1.0.4-s_2.11
Run Code Online (Sandbox Code Playgroud)

然后:dp命令就起作用了。首先尝试调用它,如果失败,请将 JAR 复制到正确的路径以使事情正常工作。

更好的解决方案

我应该首先调查为什么下载无法修复它......或者将该库放在我本地的 M2 存储库中。但这应该能让你继续前进。