mbl*_*ume 12 python rows apache-spark apache-spark-sql pyspark
我有以下要转换为 PySpark df 的行列表:
data= [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]
Run Code Online (Sandbox Code Playgroud)
我需要将其转换为 PySpark DF。
我尝试过这样做data.toDF():
属性错误:“列表”对象没有属性“toDF”
Zyg*_*ygD 12
这似乎有效:
spark.createDataFrame(data)
Run Code Online (Sandbox Code Playgroud)
检测结果:
from pyspark.sql import SparkSession, Row
spark = SparkSession.builder.getOrCreate()
data = [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]
df = spark.createDataFrame(data)
df.show()
# +-----------+------------------+------+--------+
# | id| probability|thresh|prob_opt|
# +-----------+------------------+------+--------+
# | 1| 0.0| 10| 0.45|
# | 2|0.4444444444444444| 60| 0.45|
# | 3| 0.0| 10| 0.45|
# |80000000808| 0.0| 100| 0.45|
# +-----------+------------------+------+--------+
Run Code Online (Sandbox Code Playgroud)
小智 5
您可以尝试以下代码:
from pyspark.sql import Row
rdd = sc.parallelize(data)
df=rdd.toDF()
Run Code Online (Sandbox Code Playgroud)
找到答案了!
rdd = sc.parallelize(data)
df=spark.createDataFrame(rdd, ['id', 'probability','thresh','prob_opt'])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
23794 次 |
| 最近记录: |