标签: spark-packages

在SBT生成的胖JAR中包含Spark Package JAR文件

spark-daria项目上传到Spark Packages,我正在使用sbt-spark-package插件访问另一个SBT项目中的spark-daria代码.

我可以在文件中sbt assembly使用以下代码生成的胖JAR文件中包含spark-daria build.sbt.

spDependencies += "mrpowers/spark-daria:0.3.0"

val requiredJars = List("spark-daria-0.3.0.jar")
assemblyExcludedJars in assembly := {
  val cp = (fullClasspath in assembly).value
  cp filter { f =>
    !requiredJars.contains(f.data.getName)
  }
}
Run Code Online (Sandbox Code Playgroud)

这段代码感觉就像一个黑客.有没有更好的方法在fat JAR文件中包含spark-daria?

NB我想在这里建立一个半胖的JAR文件.我希望spark-daria包含在JAR文件中,但我不希望JAR文件中包含所有Spark!

scala sbt sbt-assembly apache-spark spark-packages

10
推荐指数
1
解决办法
533
查看次数

安装 sparknlp 后,无法导入 sparknlp

以下在 Cloudera CDSW 集群网关上成功运行。

import pyspark
from pyspark.sql import SparkSession
spark = (SparkSession
            .builder
            .config("spark.jars.packages","JohnSnowLabs:spark-nlp:1.2.3")
            .getOrCreate()
         )
Run Code Online (Sandbox Code Playgroud)

产生这个输出。

Ivy Default Cache set to: /home/cdsw/.ivy2/cache
The jars for the packages stored in: /home/cdsw/.ivy2/jars
:: loading settings :: url = jar:file:/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/lib/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
JohnSnowLabs#spark-nlp added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found JohnSnowLabs#spark-nlp;1.2.3 in spark-packages
    found com.typesafe#config;1.3.0 in central
    found org.fusesource.leveldbjni#leveldbjni-all;1.8 in central
downloading http://dl.bintray.com/spark-packages/maven/JohnSnowLabs/spark-nlp/1.2.3/spark-nlp-1.2.3.jar ...
    [SUCCESSFUL ] JohnSnowLabs#spark-nlp;1.2.3!spark-nlp.jar (3357ms)
downloading https://repo1.maven.org/maven2/com/typesafe/config/1.3.0/config-1.3.0.jar ...
    [SUCCESSFUL ] com.typesafe#config;1.3.0!config.jar(bundle) (348ms)
downloading https://repo1.maven.org/maven2/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.jar ... …
Run Code Online (Sandbox Code Playgroud)

apache-spark pyspark apache-spark-mllib spark-packages johnsnowlabs-spark-nlp

5
推荐指数
2
解决办法
4297
查看次数