我正在尝试创建一个R包,因此我可以使用来自R的Apache Spark(来自数据库)的Stanford CoreNLP包装器.我正在使用sparklyr包连接到我的本地Spark实例.我创建了一个包含以下依赖函数的包
spark_dependencies <- function(spark_version, scala_version, ...) {
sparklyr::spark_dependency(
jars = c(
system.file(
sprintf("stanford-corenlp-full/stanford-corenlp-3.6.0.jar"),
package = "sparkNLP"
),
system.file(
sprintf("stanford-corenlp-full/stanford-corenlp-3.6.0-models.jar"),
package = "sparkNLP"
),
system.file(
sprintf("stanford-corenlp-full/stanford-english-corenlp-2016-01-10-models.jar"),
package = "sparkNLP"
)
),
packages = c(sprintf("databricks:spark-corenlp:0.2.0-s_%s", scala_version))
)
}
Run Code Online (Sandbox Code Playgroud)
在日志中,我可以看到两个databricks包都加载了相关的jar.我将所有coreNLP解压缩到stanford-corenlp-full文件夹,因此应正确加载所有依赖项.
Ivy Default Cache set to: /Users/Bob/.ivy2/cache
The jars for the packages stored in: /Users/Bob/.ivy2/jars
:: loading settings :: url = jar:file:/Users/Bob/Library/Caches/spark/spark-2.0.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency
com.amazonaws#aws-java-sdk-pom added as a dependency
databricks#spark-corenlp added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 …Run Code Online (Sandbox Code Playgroud)