相关疑难解决方法(0)

如何在PySpark中返回UDF中的"元组类型"？

所有数据类型pyspark.sql.types都是:

__all__ = [
    "DataType", "NullType", "StringType", "BinaryType", "BooleanType", "DateType",
    "TimestampType", "DecimalType", "DoubleType", "FloatType", "ByteType", "IntegerType",
    "LongType", "ShortType", "ArrayType", "MapType", "StructField", "StructType"]

Run Code Online (Sandbox Code Playgroud)

我必须编写一个UDF(在pyspark中),它返回一个元组数组.我给它的第二个参数是什么,它是udf方法的返回类型？这将是ArrayType(TupleType())......

python dataframe apache-spark apache-spark-sql pyspark

kam*_*nga

2016 04-25

12
推荐指数

2
解决办法

2万
查看次数

pySpark Data Frames "assert isinstance(dataType, DataType), "dataType 应该是 DataType"

我想动态生成我的数据框架构我有以下错误：

   assert isinstance(dataType, DataType), "dataType should be DataType"
AssertionError: dataType should be DataType

Run Code Online (Sandbox Code Playgroud)

代码：

filteredSchema = []
for line in correctSchema:
    fieldName = line.split(',')
    if fieldName[1] == "decimal":
        filteredSchema.append([fieldName[0], "DecimalType()"])
    elif fieldName[1] == "string":
        filteredSchema.append([fieldName[0], "StringType()"])
    elif fieldName[1] == "integer":
        filteredSchema.append([fieldName[0], "IntegerType()"])
    elif fieldName[1] == "date":
        filteredSchema.append([fieldName[0], "DateType()"])


sample1 = [(line[0], line[1], True) for line in filteredSchema]
print sample1

fields = [StructField(line[0], line[1], True) for line in filteredSchema]

Run Code Online (Sandbox Code Playgroud)

如果我使用这个：

fields = [StructField(line[0], StringType(), True) for line in filteredSchema]

Run Code Online (Sandbox Code Playgroud)

有用，

但 …

dataframe apache-spark pyspark

the*_*ing

2019 10-25

3
推荐指数

1
解决办法

9037
查看次数

标签统计

apache-spark ×2

dataframe ×2

pyspark ×2

apache-spark-sql ×1

python ×1

如何在PySpark中返回UDF中的"元组类型"？

pySpark Data Frames "assert isinstance(dataType, DataType), "dataType 应该是 DataType"

标签 统计

标签统计