__all__ = [
"DataType", "NullType", "StringType", "BinaryType", "BooleanType", "DateType",
"TimestampType", "DecimalType", "DoubleType", "FloatType", "ByteType", "IntegerType",
"LongType", "ShortType", "ArrayType", "MapType", "StructField", "StructType"]
Run Code Online (Sandbox Code Playgroud)
我必须编写一个UDF(在pyspark中),它返回一个元组数组.我给它的第二个参数是什么,它是udf方法的返回类型?这将是ArrayType(TupleType())......
我想动态生成我的数据框架构我有以下错误:
assert isinstance(dataType, DataType), "dataType should be DataType"
AssertionError: dataType should be DataType
Run Code Online (Sandbox Code Playgroud)
代码:
filteredSchema = []
for line in correctSchema:
fieldName = line.split(',')
if fieldName[1] == "decimal":
filteredSchema.append([fieldName[0], "DecimalType()"])
elif fieldName[1] == "string":
filteredSchema.append([fieldName[0], "StringType()"])
elif fieldName[1] == "integer":
filteredSchema.append([fieldName[0], "IntegerType()"])
elif fieldName[1] == "date":
filteredSchema.append([fieldName[0], "DateType()"])
sample1 = [(line[0], line[1], True) for line in filteredSchema]
print sample1
fields = [StructField(line[0], line[1], True) for line in filteredSchema]
Run Code Online (Sandbox Code Playgroud)
如果我使用这个:
fields = [StructField(line[0], StringType(), True) for line in filteredSchema]
Run Code Online (Sandbox Code Playgroud)
有用,
但 …