Zyg*_*ygD 1 arrays split apache-spark apache-spark-sql pyspark
如何将字符串列拆分为字符数组?
输入:
from pyspark.sql import functions as F
df = spark.createDataFrame([('Vilnius',), ('Riga',), ('Tallinn',), ('New York',)], ['col_cities'])
df.show()
# +----------+
# |col_cities|
# +----------+
# | Vilnius|
# | Riga|
# | Tallinn|
# | New York|
# +----------+
Run Code Online (Sandbox Code Playgroud)
期望的输出:
# +----------+------------------------+
# |col_cities|split |
# +----------+------------------------+
# |Vilnius |[V, i, l, n, i, u, s] |
# |Riga |[R, i, g, a] |
# |Tallinn |[T, a, l, l, i, n, n] |
# |New York |[N, e, w, , Y, o, r, k]|
# +----------+------------------------+
Run Code Online (Sandbox Code Playgroud)
split您可以与具有负前瞻的正则表达式模式一起使用:
df.withColumn('split', F.split('col_cities', '(?!$)'))
Run Code Online (Sandbox Code Playgroud)
+----------+------------------------+
|col_cities|split |
+----------+------------------------+
|Vilnius |[V, i, l, n, i, u, s] |
|Riga |[R, i, g, a] |
|Tallinn |[T, a, l, l, i, n, n] |
|New York |[N, e, w, , Y, o, r, k]|
+----------+------------------------+
Run Code Online (Sandbox Code Playgroud)