Spark:执行 python kinesis 流示例

Ris*_*cha 5 apache-spark amazon-kinesis spark-streaming pyspark

我是(非常)新来的火花,如果这是一个愚蠢的问题,我很抱歉。

我正在尝试执行 spark (2.2.0) python spark 流示例,但是我一直遇到以下问题:

Traceback (most recent call last):
  File "/Users/rmanoch/Downloads/spark-2.2.0-bin-hadoop2.7/kinesis_wordcount_asl.py", line 76, in <module>
    ssc, appName, streamName, endpointUrl, regionName, InitialPositionInStream.LATEST, 2)
  File "/Users/rmanoch/Downloads/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kinesis.py", line 92, in createStream
  File "/Users/rmanoch/Downloads/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/Users/rmanoch/Downloads/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o27.createStream. Trace:
py4j.Py4JException: Method createStream([class org.apache.spark.streaming.api.java.JavaStreamingContext, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.Integer, class org.apache.spark.streaming.Duration, class org.apache.spark.storage.StorageLevel, null, null, null, null, null]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:272)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:745)
Run Code Online (Sandbox Code Playgroud)

我从 spark 网站下载的 tarball 中没有包含外部文件夹(似乎有一些许可证问题),所以这是我一直在尝试执行的命令(kinesis_wordcount_asl.pygithub下载后)

bin/spark-submit --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.2.0 kinesis_wordcount_asl.py sparkEnrichedDev relay-enriched-dev https://kinesis.us-west-2.amazonaws.com us-west-2
Run Code Online (Sandbox Code Playgroud)

如果需要,很乐意提供任何其他详细信息。

hi-*_*zir 3

根据异常,核心 Spark/Spark 流和spark-kinesis. API 在 Spark 2.1 和 2.2 ( SPARK-19405 ) 之间发生了变化,版本不匹配会导致类似的错误。

这让我认为您正在使用不正确的二进制文件提交(只是猜测) -如果您使用模式PATHPYTHONPATH则可能会出现问题。因为您得到签名不匹配,我们可以假设它已正确加载并且存在于.SPARK_HOMElocalspark-kinesisorg.apache.spark.streaming.kinesis.KinesisUtilsPythonHelperCLASSPATH