use*_*768 19 apache-spark pyspark
我试图在Spark(Pyspark)中创建一个空数据帧.
我使用类似于这里讨论的方法在这里输入链接描述,但它不起作用.
这是我的代码
df = sqlContext.createDataFrame(sc.emptyRDD(), schema)
Run Code Online (Sandbox Code Playgroud)
这是错误
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 404, in createDataFrame
rdd, schema = self._createFromRDD(data, schema, samplingRatio)
File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 285, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio)
File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 229, in _inferSchema
first = rdd.first()
File "/Users/Me/Desktop/spark-1.5.1-bin-hadoop2.6/python/pyspark/rdd.py", line 1320, in first
raise ValueError("RDD is empty")
ValueError: RDD is empty
Run Code Online (Sandbox Code Playgroud)
Ton*_*res 31
扩展Joe Widen的答案,你实际上可以创建没有字段的模式:
schema = StructType([])
Run Code Online (Sandbox Code Playgroud)
所以当你使用它作为你的架构创建DataFrame时,你最终会得到一个DataFrame[].
>>> empty = sqlContext.createDataFrame(sc.emptyRDD(), schema)
DataFrame[]
>>> empty.schema
StructType(List())
Run Code Online (Sandbox Code Playgroud)
在Scala中,如果您选择使用sqlContext.emptyDataFrame并签出架构,它将返回StructType().
scala> val empty = sqlContext.emptyDataFrame
empty: org.apache.spark.sql.DataFrame = []
scala> empty.schema
res2: org.apache.spark.sql.types.StructType = StructType()
Run Code Online (Sandbox Code Playgroud)
Joe*_*den 10
在写这个答案的时候,看起来你需要某种架构
from pyspark.sql.types import *
field = [StructField("field1", StringType(), True)]
schema = StructType(field)
sqlContext.createDataFrame(sc.emptyRDD(), schema)
Run Code Online (Sandbox Code Playgroud)
这适用于Spark 2.0.0或更高版本
from pyspark.sql import SQLContext
sc = spark.sparkContext
schema = StructType([StructField('col1', StringType(), False),StructField('col2', IntegerType(), True)])
sqlContext.createDataFrame(sc.emptyRDD(), schema)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
63523 次 |
| 最近记录: |