小编Mr.*_*mia的帖子

类型错误:数组字段中的元素类别:无法合并类型 <class 'pyspark.sql.types.StringType'> 和 <class 'pyspark.sql.types.DoubleType'>

我正在使用 Pandas 读取 csv 文件,它是一个两列数据帧,然后我试图转换为 spark 数据帧。代码如下:

from pyspark.sql import SQLContext
sqlCtx = SQLContext(sc)
sdf = sqlCtx.createDataFrame(df)
Run Code Online (Sandbox Code Playgroud)

数据框:

print(df) 
Run Code Online (Sandbox Code Playgroud)

给出了这个:

    Name    Category
0   EDSJOBLIST apply at www.edsjoblist.com  ['biotechnology', 'clinical', 'diagnostic', 'd...
1   Power Direct Marketing  ['advertising', 'analytics', 'brand positionin...
2   CHA Hollywood Medical Center, L.P.  ['general medical and surgical hospital', 'hea...
3   JING JING GOURMET   [nan]
4   TRUE LIFE KINGDOM MINISTRIES    ['religious organization']
5   fasterproms ['microsoft .net']
6   STEREO ZONE ['accessory', 'audio', 'car audio', 'chrome', ...
7   SAN FRANCISCO NEUROLOGICAL …
Run Code Online (Sandbox Code Playgroud)

python dataframe pandas apache-spark-sql pyspark

1
推荐指数
1
解决办法
3603
查看次数

标签 统计

apache-spark-sql ×1

dataframe ×1

pandas ×1

pyspark ×1

python ×1