Pyspark 错误：- dataType <class 'pyspark.sql.types.StringType'> 应该是 <class 'pyspark.sql.types.DataType'> 的一个实例

Question

Pyspark 错误：- dataType <class 'pyspark.sql.types.StringType'> 应该是 <class 'pyspark.sql.types.DataType'> 的一个实例

Kar*_*wal 4 python apache-spark apache-spark-sql pyspark

我需要从 pipelinedRDD 中提取一些数据，但是在将其转换为 Dataframe 时，出现以下错误：

Traceback (most recent call last):

  File "/home/karan/Desktop/meds.py", line 42, in <module>

    relevantToSymEntered(newrdd)

  File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntered

    mat = spark.createDataFrame(self,StructType([StructField("Prescribed 

medicine",StringType), StructField(["Disease","ID","Symptoms 

Recorded","Severeness"],ArrayType)]))

  File "/home/karan/Downloads/spark-2.4.2-bin-

hadoop2.7/python/pyspark/sql/types.py", line 409, in __init__

    "dataType %s should be an instance of %s" % (dataType, DataType)

AssertionError: dataType <class 'pyspark.sql.types.StringType'> should be an 
instance of <class 'pyspark.sql.types.DataType'>

Run Code Online (Sandbox Code Playgroud)

1. 我的错误是不同类型的，它是 TypeError 而我遇到了 AssertionError 的问题。

我的问题与数据类型的转换无关。

我已经尝试过使用 toDF() 但它更改了不受欢迎的列名。

Traceback (most recent call last):

  File "/home/karan/Desktop/meds.py", line 42, in <module>

    relevantToSymEntered(newrdd)

  File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntered

    mat = spark.createDataFrame(self,StructType([StructField("Prescribed 

medicine",StringType), StructField(["Disease","ID","Symptoms 

Recorded","Severeness"],ArrayType)]))

  File "/home/karan/Downloads/spark-2.4.2-bin-

hadoop2.7/python/pyspark/sql/types.py", line 409, in __init__

    "dataType %s should be an instance of %s" % (dataType, DataType)

AssertionError: dataType <class 'pyspark.sql.types.StringType'> should be an 
instance of <class 'pyspark.sql.types.DataType'>

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 10

StructType([StructField("Prescribed medicine",StringType), StructField(["Disease","ID","Symptoms Recorded","Severeness"],ArrayType)])

替换为：

StructType([StructField("Prescribed medicine",StringType()), StructField(["Disease","ID","Symptoms Recorded","Severeness"],ArrayType())]).

您需要实例化该类。

归档时间：	6 年，8 月前
查看次数：	9807 次
最近记录：	5 年，8 月前

Pyspark 错误：- dataType &lt;class 'pyspark.sql.types.StringType'&gt; 应该是 &lt;class 'pyspark.sql.types.DataType'&gt; 的一个实例

Pyspark 错误：- dataType <class 'pyspark.sql.types.StringType'> 应该是 <class 'pyspark.sql.types.DataType'> 的一个实例