Sri*_*Sri 6 uuid apache-spark apache-spark-sql
我想在Dataframe(UUID生成器)中添加一个新列.
UUID值看起来像 21534cf7-cff9-482a-a3a8-9e7244240da7
我的研究:
我试过withColumn火花的方法.
val DF2 = DF1.withColumn("newcolname", DF1("existingcolname" + 1)
Run Code Online (Sandbox Code Playgroud)
因此DF2将newcolname在所有行中添加一个额外的列,并添加1.
根据我的要求,我想要一个可以生成UUID的新列.
Paw*_*nko 16
你应该尝试这样的事情:
val sc: SparkContext = ...
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
val generateUUID = udf(() => UUID.randomUUID().toString)
val df1 = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
val df2 = df1.withColumn("UUID", generateUUID())
df1.show()
df2.show()
Run Code Online (Sandbox Code Playgroud)
输出将是:
+---+-----+
| id|value|
+---+-----+
|id1| 1|
|id2| 4|
|id3| 5|
+---+-----+
+---+-----+--------------------+
| id|value| UUID|
+---+-----+--------------------+
|id1| 1|f0cfd0e2-fbbe-40f...|
|id2| 4|ec8db8b9-70db-46f...|
|id3| 5|e0e91292-1d90-45a...|
+---+-----+--------------------+
Run Code Online (Sandbox Code Playgroud)
Max*_*ind 12
您可以使用内置的 Spark SQL uuid 函数:
.withColumn("uuid", expr("uuid()"))
Run Code Online (Sandbox Code Playgroud)
Scala 中的完整示例:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
object CreateDf extends App {
val spark = SparkSession.builder
.master("local[*]")
.appName("spark_local")
.getOrCreate()
import spark.implicits._
Seq(1, 2, 3).toDF("col1")
.withColumn("uuid", expr("uuid()"))
.show(false)
}
Run Code Online (Sandbox Code Playgroud)
输出:
+----+------------------------------------+
|col1|uuid |
+----+------------------------------------+
|1 |24181c68-51b7-42ea-a9fd-f88dcfa10062|
|2 |7cd21b25-017e-4567-bdd3-f33b001ee497|
|3 |1df7cfa8-af8a-4421-834f-5359dc3ae417|
+----+------------------------------------+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
10862 次 |
| 最近记录: |