类型错误:“JavaPackage”对象不可调用 AWS Glue Pyspark

rps*_*pta 10 java pyspark aws-glue

我正在尝试按照 AWS文档在我的 ubuntu 虚拟机上设置 AWS Glue 环境。

我已经完成了必要的操作,例如下载 awsglue libs、spark 包并按照建议设置 Spark home。之后,我无法初始化粘合上下文并面临以下错误。

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glueContext = GlueContext(SparkContext.getOrCreate())
or 
glueContext = GlueContext(sc)
Run Code Online (Sandbox Code Playgroud)

错误:

TypeError          Traceback (most recent call last)
<ipython-input-15-0798793d4033> in <module>
----> 1 glueContext = GlueContext(SparkContext.getOrCreate())

~/aws-glue-libs-glue-1.0/PyGlue.zip/awsglue/context.py in __init__(self, sparkContext, **options)
     43         super(GlueContext, self).__init__(sparkContext)
     44         register(sparkContext)
---> 45         self._glue_scala_context = self._get_glue_scala_context(**options)
     46         self.create_dynamic_frame = DynamicFrameReader(self)
     47         self.write_dynamic_frame = DynamicFrameWriter(self)

~/aws-glue-libs-glue-1.0/PyGlue.zip/awsglue/context.py in _get_glue_scala_context(self, **options)
     64 
     65         if min_partitions is None:
---> 66             return self._jvm.GlueContext(self._jsc.sc())
     67         else:
     68             return self._jvm.GlueContext(self._jsc.sc(), min_partitions, target_partitions)

TypeError: 'JavaPackage' object is not callable
Run Code Online (Sandbox Code Playgroud)

小智 1

按照 URL 中给出的说明实施后 ( https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html )。

检查spark.executor.extraClassPathspark.driver.extraClassPath env 变量是否设置为{user_path}\\aws-glue-libs-glue-{1.0/master}\\jarsv1\\*

要验证类路径,请执行以下代码:

from pyspark.context import SparkContext

sc = SparkContext()
sc.getConf().getAll()
Run Code Online (Sandbox Code Playgroud)

给出的错误主要是由于指向 AWS 相关 jar 文件的类路径问题造成的。