通过传递列表来配置Concat数据框列

Tro*_*ump 2 dataframe pyspark

    from pyspark.sql import Row, functions as F
    row = Row("UK_1","UK_2","Date","Cat")
    df = (sc.parallelize
    ([
        row(1,1,'12/10/2016',"A"),
        row(1,2,None,'A'),
        row(2,1,'14/10/2016','B'),
        row(3,3,'!~2016/2/276','B'),
        row(None,1,'26/09/2016','A'),
        row(1,1,'12/10/2016',"A"),
        row(1,2,None,'A'),
        row(2,1,'14/10/2016','B'),
        row(None,None,'!~2016/2/276','B'),
        row(None,1,'26/09/2016','A')
        ]).toDF())

       pks = ["UK_1","UK_2"]

      df1 = (
      df
      .select(columns) 
       #.withColumn('pk',F.concat(pks))
      .withColumn('pk',F.concat("UK_1","UK_2"))
      )

   df1.show()
Run Code Online (Sandbox Code Playgroud)

有什么方法可以将列列表传递给concat吗?我想将代码用于列可以变化的场景,我想将其作为列表传递。

Psi*_*dom 5

是的,语法是*argspython中的(可变数量的参数):

df.withColumn("pk", F.concat(*pks)).show()

+----+----+------------+---+----+
|UK_1|UK_2|        Date|Cat|  pk|
+----+----+------------+---+----+
|   1|   1|  12/10/2016|  A|  11|
|   1|   2|        null|  A|  12|
|   2|   1|  14/10/2016|  B|  21|  
|   3|   3|!~2016/2/276|  B|  33|
|null|   1|  26/09/2016|  A|null|
|   1|   1|  12/10/2016|  A|  11|
|   1|   2|        null|  A|  12|
|   2|   1|  14/10/2016|  B|  21|
|null|null|!~2016/2/276|  B|null|
|null|   1|  26/09/2016|  A|null|
+----+----+------------+---+----+
Run Code Online (Sandbox Code Playgroud)