删除前导零 pyspark?

Vig*_*jan -2 pyspark

我想删除pyspark中一列的前导零?

在此处输入图片说明

niu*_*uer 8

另一种方法是在regexp-replace这里使用:

from pyspark.sql import functions as F
df.show()
df = df.withColumn('subcategory', F.regexp_replace('subcategory', r'0', ''))
df = df.withColumn('subcategory_label', F.regexp_replace('subcategory_label', r'0', ''))
df.show()
Run Code Online (Sandbox Code Playgroud)

输入数据帧:

+-----------+-----------------+--------+
|subcategory|subcategory_label|category|
+-----------+-----------------+--------+
|      00EEE|        00EEE FFF|   Drink|
|    0000EEE|        00EEE FFF|   Fruit|
|       0EEE|       000EEE FFF|    Meat|
+-----------+-----------------+--------+
Run Code Online (Sandbox Code Playgroud)

输出数据帧:

+-----------+-----------------+--------+
|subcategory|subcategory_label|category|
+-----------+-----------------+--------+
|        EEE|          EEE FFF|   Drink|
|        EEE|          EEE FFF|   Fruit|
|        EEE|          EEE FFF|    Meat|
+-----------+-----------------+--------+

Run Code Online (Sandbox Code Playgroud)

如果需要将0s 放在字符串的开头,您可以使用它们来确保没有中间0被删除。:

df = df.withColumn('subcategory', F.regexp_replace('subcategory', r'^[0]*', ''))
df = df.withColumn('subcategory_label', F.regexp_replace('subcategory_label', r'^[0]*', ''))
Run Code Online (Sandbox Code Playgroud)