未找到类org.apache.hadoop.fs.s3native.NativeS3FileSystem(Spark 1.6 Windows)

Han*_*art 3 windows amazon-s3 apache-spark windows-10 pyspark

我试图使用pySpark从本地spark上下文访问s3文件.我一直在File "C:\Spark\python\lib\py4j-0.9-src.zip\py4j\protocol.py", line 308, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o20.parquet. : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found

我已经设定os.environ['AWS_ACCESS_KEY_ID']os.environ['AWS_SECRET_ACCESS_KEY']之前我打电话df = sqc.read.parquet(input_path).我还添加了这些行: hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem") hadoopConf.set("fs.s3.awsSecretAccessKey", os.environ["AWS_SECRET_ACCESS_KEY"]) hadoopConf.set("fs.s3.awsAccessKeyId", os.environ["AWS_ACCESS_KEY_ID"]) 我也试图改变s3s3n,s3a.都没有奏效.

知道如何让它工作吗?我在Windows 10,pySpark,Spark 1.6.1上为Hadoop 2.6.0构建

Fra*_*nzi 7

我正在运行pyspark,附上hadoop-aws的库.

您需要在输入路径中使用s3n.我是从Mac-OS运行的.所以我不确定它是否适用于Windows.

$SPARK_HOME/bin/pyspark --packages org.apache.hadoop:hadoop-aws:2.7.1
Run Code Online (Sandbox Code Playgroud)