在 Pyspark 中选择字符(“-”)之前/之后的特定字符串

Kat*_*ael 0 pyspark

我使用 substring 来获取第一个和最后一个值。但是如何在字符串中找到特定字符并在它之前/之后获取值

the*_*hon 6

试试这些……听起来像你在找什么

参考文档:

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.substring_index https://spark.apache.org/docs/latest/api/python/pyspark .sql.html#pyspark.sql.functions.split

df = spark.createDataFrame([('hello-there',)], ['text'])

from pyspark.sql.functions import substring_index
df.select(substring_index(df.text, '-', 1).alias('left')).show() # left of delim
df.select(substring_index(df.text, '-', -1).alias('right')).show() # right of delim

+-----+
| left|
+-----+
|hello|
+-----+

+-----+
|right|
+-----+
|there|
+-----+

from pyspark.sql.functions import split
split_df = df.select(split(df.text, '-').alias('split_text'))
split_df.selectExpr("split_text[0] as left").show() # left of delim
split_df.selectExpr("split_text[1] as right").show() # right of delim

+-----+
| left|
+-----+
|hello|
+-----+

+-----+
|right|
+-----+
|there|
+-----+

from pyspark.sql.functions import substring_index, substring, concat, col, lit

df = spark.createDataFrame([('will-smith',)], ['text'])

df = df\
.withColumn("left", substring_index(df.text, '-', 1))\
.withColumn("right", substring_index(df.text, '-', -1))\

df = df\
.withColumn("left_sub", substring(df.left, -2, 2))\
.withColumn("right_sub", substring(df.right, 1, 2))

df = df\
.withColumn("concat_sub", concat(col("left_sub"), lit("-"), col("right_sub")))

df.show()

+----------+----+-----+--------+---------+----------+
|      text|left|right|left_sub|right_sub|concat_sub|
+----------+----+-----+--------+---------+----------+
|will-smith|will|smith|      ll|       sm|     ll-sm|
+----------+----+-----+--------+---------+----------+


Run Code Online (Sandbox Code Playgroud)