小编Ale*_*hev的帖子

Spark-submit:未定义的函数parse_url

函数 - parse_url总是正常工作,如果我们使用spark-sql throw sql-client(通过thrift服务器),IPython,pyspark-shell,但它不起作用throw spark-submit模式:

/opt/spark/bin/spark-submit --driver-memory 4G --executor-memory 8G main.py
Run Code Online (Sandbox Code Playgroud)

错误是:

Traceback (most recent call last):
  File "/home/spark/***/main.py", line 167, in <module>
    )v on registrations.ga = v.ga and reg_path = oldtrack_page and registration_day = day_cl_log  and date_cl_log <= registration_date""")
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 552, in sql
  File "/opt/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 40, in deco
pyspark.sql.utils.AnalysisException: undefined function parse_url;
Build step 'Execute shell' marked build as failure
Finished: FAILURE
Run Code Online (Sandbox Code Playgroud)

所以,我们在这里使用解决方法:

def python_parse_url(url, que, key): …
Run Code Online (Sandbox Code Playgroud)

python apache-spark apache-spark-sql pyspark pyspark-sql

2
推荐指数
1
解决办法
1610
查看次数