我正在使用带有两个节点的apache spark版本2.0.0的独立集群,我还没有安装hive.我在创建数据帧时遇到以下错误.
from pyspark import SparkContext
from pyspark import SQLContext
sqlContext = SQLContext(sc)
l = [('Alice', 1)]
sqlContext.createDataFrame(l).collect()
---------------------------------------------------------------------------
IllegalArgumentException Traceback (most recent call last)
<ipython-input-9-63bc4f21f23e> in <module>()
----> 1 sqlContext.createDataFrame(l).collect()
/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/context.pyc in createDataFrame(self, data, schema, samplingRatio)
297 Py4JJavaError: ...
298 """
--> 299 return self.sparkSession.createDataFrame(data, schema, samplingRatio)
300
301 @since(1.3)
/home/mok/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/session.pyc in createDataFrame(self, data, schema, samplingRatio)
522 rdd, schema = self._createFromLocal(map(prepare, data), schema)
523 jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
--> 524 jdf = self._jsparkSession.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
525 df = DataFrame(jdf, self._wrapped)
526 …
Run Code Online (Sandbox Code Playgroud)