尝试Parquet在PySpark中读取文件,但得到Py4JJavaError。我什至尝试从中读取它,spark-shell并能够这样做。根据它在Scala中而不是在PySpark中运行的Python API,我无法理解我在做什么错。
spark = SparkSession.builder.master("local").appName("test-read").getOrCreate()
sdf = spark.read.parquet("game_logs.parquet")
Run Code Online (Sandbox Code Playgroud)
堆栈跟踪:
Py4JJavaError Traceback (most recent call last)
<timed exec> in <module>()
~/pyenv/pyenv/lib/python3.6/site-packages/pyspark/sql/readwriter.py in parquet(self, *paths)
301 [('name', 'string'), ('year', 'int'), ('month', 'int'), ('day', 'int')]
302 """
--> 303 return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
304
305 @ignore_unicode_prefix
~/pyenv/pyenv/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
~/pyenv/pyenv/lib/python3.6/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw): …Run Code Online (Sandbox Code Playgroud)