我试图测试如何使用Spark 2.1在HDFS 2.7中写入数据.我的数据是一个简单的虚拟值序列,输出应该由属性:id和key分区.
// Simple case class to cast the data
case class SimpleTest(id:String, value1:Int, value2:Float, key:Int)
// Actual data to be stored
val testData = Seq(
SimpleTest("test", 12, 13.5.toFloat, 1),
SimpleTest("test", 12, 13.5.toFloat, 2),
SimpleTest("test", 12, 13.5.toFloat, 3),
SimpleTest("simple", 12, 13.5.toFloat, 1),
SimpleTest("simple", 12, 13.5.toFloat, 2),
SimpleTest("simple", 12, 13.5.toFloat, 3)
)
// Spark's workflow to distribute, partition and store
// sc and sql are the SparkContext and SparkSession, respectively
val testDataP = sc.parallelize(testData, 6)
val …Run Code Online (Sandbox Code Playgroud)