我正在使用 Pandas 读取 csv 文件,它是一个两列数据帧,然后我试图转换为 spark 数据帧。代码如下:
from pyspark.sql import SQLContext
sqlCtx = SQLContext(sc)
sdf = sqlCtx.createDataFrame(df)
Run Code Online (Sandbox Code Playgroud)
数据框:
print(df)
Run Code Online (Sandbox Code Playgroud)
给出了这个:
Name Category
0 EDSJOBLIST apply at www.edsjoblist.com ['biotechnology', 'clinical', 'diagnostic', 'd...
1 Power Direct Marketing ['advertising', 'analytics', 'brand positionin...
2 CHA Hollywood Medical Center, L.P. ['general medical and surgical hospital', 'hea...
3 JING JING GOURMET [nan]
4 TRUE LIFE KINGDOM MINISTRIES ['religious organization']
5 fasterproms ['microsoft .net']
6 STEREO ZONE ['accessory', 'audio', 'car audio', 'chrome', ...
7 SAN FRANCISCO NEUROLOGICAL …Run Code Online (Sandbox Code Playgroud)