Kar*_*wal 4 python apache-spark apache-spark-sql pyspark
我需要从 pipelinedRDD 中提取一些数据,但是在将其转换为 Dataframe 时,出现以下错误:
Traceback (most recent call last):
File "/home/karan/Desktop/meds.py", line 42, in <module>
relevantToSymEntered(newrdd)
File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntered
mat = spark.createDataFrame(self,StructType([StructField("Prescribed
medicine",StringType), StructField(["Disease","ID","Symptoms
Recorded","Severeness"],ArrayType)]))
File "/home/karan/Downloads/spark-2.4.2-bin-
hadoop2.7/python/pyspark/sql/types.py", line 409, in __init__
"dataType %s should be an instance of %s" % (dataType, DataType)
AssertionError: dataType <class 'pyspark.sql.types.StringType'> should be an
instance of <class 'pyspark.sql.types.DataType'>
Run Code Online (Sandbox Code Playgroud)
1. 我的错误是不同类型的,它是 TypeError 而我遇到了 AssertionError 的问题。
我已经尝试过使用 toDF() 但它更改了不受欢迎的列名。
Traceback (most recent call last):
File "/home/karan/Desktop/meds.py", line 42, in <module>
relevantToSymEntered(newrdd)
File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntered
mat = spark.createDataFrame(self,StructType([StructField("Prescribed
medicine",StringType), StructField(["Disease","ID","Symptoms
Recorded","Severeness"],ArrayType)]))
File "/home/karan/Downloads/spark-2.4.2-bin-
hadoop2.7/python/pyspark/sql/types.py", line 409, in __init__
"dataType %s should be an instance of %s" % (dataType, DataType)
AssertionError: dataType <class 'pyspark.sql.types.StringType'> should be an
instance of <class 'pyspark.sql.types.DataType'>
Run Code Online (Sandbox Code Playgroud)
小智 10
StructType([StructField("Prescribed medicine",StringType), StructField(["Disease","ID","Symptoms Recorded","Severeness"],ArrayType)])
替换为:
StructType([StructField("Prescribed medicine",StringType()), StructField(["Disease","ID","Symptoms Recorded","Severeness"],ArrayType())]).
您需要实例化该类。
| 归档时间: |
|
| 查看次数: |
9807 次 |
| 最近记录: |