我尝试使用 pyspark(集群)+ jupyter notebook 连接到 PostgreSQL,奇怪的是在控制台中使用 pyspark 工作正常但由于 jupyter 我有这个错误,知道吗?
这是我的脚本,非常简单:
import findspark
findspark.init()
import pyspark
from pyspark import SparkContext, SparkConf
from pyspark.sql import DataFrameReader, SQLContext
sc = pyspark.SparkContext(master='spark://172.17.0.3:7077', appName='app10')
sqlContext = pyspark.SQLContext(sc)
url = 'jdbc:postgresql://192.168.1.126:5432/myDB'
properties = {'user':'postgres', 'password':'postgres'}
df = DataFrameReader(sqlContext).jdbc(url=url, table='(select * from my_table limit 1) as tb', properties=properties)
df.printSchema()
df.show() <----- (this is line that luanch error)
Run Code Online (Sandbox Code Playgroud)
自 Jupyter 以来的错误:
Py4JJavaError: An error occurred while calling o117.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task …Run Code Online (Sandbox Code Playgroud)