Din*_*ius 5 scala amazon-s3 amazon-web-services maven apache-spark
我的代码应该访问存储在 S3 上的一些文件(此代码在一台机器上运行良好,而在另一台机器上失败;基本上,当它从 Intellij IDEA 本地(而不是在集群上)执行时,它会失败):
sc.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxx")
val sqlContext = new SQLContext(sc)
var df = sqlContext.read.json("s3n://myPath/*.json")
Run Code Online (Sandbox Code Playgroud)
我在该行收到以下错误var df = sqlContext.read.json("s3n://myPath/*.json"):
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Run Code Online (Sandbox Code Playgroud)
我阅读了有关此问题的类似帖子,其中提到如果使用 Spark 1.6.2,解决方案是使用org.apache.hadoop hadoop-aws 2.6.0. 就我而言,它还没有解决问题。
我的pom.xml(摘录):
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
<scala.version>2.10.6</scala.version>
<spark.version>1.6.2</spark.version>
<jackson.version>2.8.3</jackson.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<!--<scope>provided</scope>-->
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<!--<scope>provided</scope>-->
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.10</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>${jackson.version}</version>
</dependency>
<dependency>
<groupId>com.lambdaworks</groupId>
<artifactId>jacks_2.10</artifactId>
<version>2.3.3</version>
</dependency>
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.53</version>
</dependency>
<dependency>
<groupId>net.debasishg</groupId>
<artifactId>redisclient_2.10</artifactId>
<version>3.3</version>
</dependency>
</dependencies>
Run Code Online (Sandbox Code Playgroud)
在 中添加以下内容dependency应该可以解决问题
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
我希望这有帮助
| 归档时间: |
|
| 查看次数: |
9124 次 |
| 最近记录: |