sbt程序集着色创建胖jar以在spark上运行

Kum*_*bhu 11 sbt guava sbt-assembly apache-spark grpc

我正在使用sbt程序集来创建一个可以在火花上运行的胖罐.有依赖性grpc-netty.spark上的Guava版本比所需的版本旧grpc-netty,我遇到了这个错误:java.lang.NoSuchMethodError:com.google.common.base.Preconditions.checkArgument.我能够通过在spark上将userClassPathFirst设置为true来解决此问题,但会导致其他错误.

如果我错了,请纠正我,但根据我的理解,如果我正确地进行着色,我不应该将userClassPathFirst设置为true.这是我现在的着色方式:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.guava.**" -> "my_conf.@1")
    .inLibrary("com.google.guava" % "guava" % "20.0")
    .inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)

libraryDependencies ++= Seq(
  "org.scalaj" %% "scalaj-http" % "2.3.0",
  "org.json4s" %% "json4s-native" % "3.2.11",
  "org.json4s" %% "json4s-jackson" % "3.2.11",
  "org.apache.spark" %% "spark-core" % "2.2.0" % "provided",
  "org.apache.spark" % "spark-sql_2.11" % "2.2.0" % "provided",
  "org.clapper" %% "argot" % "1.0.3",
  "com.typesafe" % "config" % "1.3.1",
  "com.databricks" %% "spark-csv" % "1.5.0",
  "org.apache.spark" % "spark-mllib_2.11" % "2.2.0" % "provided",
  "io.grpc" % "grpc-netty" % "1.1.2",
  "com.google.guava" % "guava" % "20.0"
)
Run Code Online (Sandbox Code Playgroud)

我在这里做错了什么,我该如何解决?

Nik*_*iev 5

你快到了.什么shadeRule做的是它重命名的名称,而不是库名:

ShadeRule.rename主规则用于重命名类.所有对重命名的类的引用也将更新.

事实上,在com.google.guava:guava没有包类的类中com.google.guava:

$ jar tf ~/Downloads/guava-20.0.jar  | sed -e 's:/[^/]*$::' | sort | uniq
META-INF
META-INF/maven
META-INF/maven/com.google.guava
META-INF/maven/com.google.guava/guava
com
com/google
com/google/common
com/google/common/annotations
com/google/common/base
com/google/common/base/internal
com/google/common/cache
com/google/common/collect
com/google/common/escape
com/google/common/eventbus
com/google/common/graph
com/google/common/hash
com/google/common/html
com/google/common/io
com/google/common/math
com/google/common/net
com/google/common/primitives
com/google/common/reflect
com/google/common/util
com/google/common/util/concurrent
com/google/common/xml
com/google/thirdparty
com/google/thirdparty/publicsuffix
Run Code Online (Sandbox Code Playgroud)

将阴影规则更改为此应该足够了:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "my_conf.@1")
    .inLibrary("com.google.guava" % "guava" % "20.0")
    .inLibrary("io.grpc" % "grpc-netty" % "1.1.2")
)
Run Code Online (Sandbox Code Playgroud)

所以你不需要改变userClassPathFirst.

此外,您可以像这样简化着色规则:

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.common.**" -> "my_conf.@1").inAll
)
Run Code Online (Sandbox Code Playgroud)

由于org.apache.spark依赖关系是provided,它们不会被包含在你的jar中并且不会被着色(因此spark将使用它自己在集群上的无阴影版本的guava).