以下示例代码尝试将一些案例对象放入数据框中.代码包括案例对象层次结构的定义和使用此特征的案例类:
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.SQLContext
sealed trait Some
case object AType extends Some
case object BType extends Some
case class Data( name : String, t: Some)
object Example {
def main(args: Array[String]) : Unit = {
val conf = new SparkConf()
.setAppName( "Example" )
.setMaster( "local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val df = sc.parallelize( Seq( Data( "a", AType), Data( "b", BType) ), 4).toDF()
df.show()
}
}
Run Code Online (Sandbox Code Playgroud)
执行代码时,我遗憾地遇到以下异常:
java.lang.UnsupportedOperationException: Schema for type …Run Code Online (Sandbox Code Playgroud) 我有一些用 PySpark 编写的代码,我正忙着将它转换为 Scala。它一直进展顺利,除了现在我在 Scala 中为用户定义的函数而苦苦挣扎。
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark.sql import functions as F
spark = SparkSession.builder.master('local[*]').getOrCreate()
a = spark.sparkContext.parallelize([(1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,), (10,)]).toDF(["index"]).withColumn("a1", F.lit(1)).withColumn("a2", F.lit(2)).withColumn("a3", F.lit(3))
a = a.select("index", F.struct(*('a' + str(c) for c in range(1, 4))).alias('a'))
a.show()
def a_to_b(a):
# 1. check if a technical cure exists
b = {}
for i in range(1, 4):
b.update({'b' + str(i): a[i - 1] ** …Run Code Online (Sandbox Code Playgroud)