另一种方法是在regexp-replace这里使用:
from pyspark.sql import functions as F
df.show()
df = df.withColumn('subcategory', F.regexp_replace('subcategory', r'0', ''))
df = df.withColumn('subcategory_label', F.regexp_replace('subcategory_label', r'0', ''))
df.show()
Run Code Online (Sandbox Code Playgroud)
输入数据帧:
+-----------+-----------------+--------+
|subcategory|subcategory_label|category|
+-----------+-----------------+--------+
| 00EEE| 00EEE FFF| Drink|
| 0000EEE| 00EEE FFF| Fruit|
| 0EEE| 000EEE FFF| Meat|
+-----------+-----------------+--------+
Run Code Online (Sandbox Code Playgroud)
输出数据帧:
+-----------+-----------------+--------+
|subcategory|subcategory_label|category|
+-----------+-----------------+--------+
| EEE| EEE FFF| Drink|
| EEE| EEE FFF| Fruit|
| EEE| EEE FFF| Meat|
+-----------+-----------------+--------+
Run Code Online (Sandbox Code Playgroud)
如果需要将0s 放在字符串的开头,您可以使用它们来确保没有中间0被删除。:
df = df.withColumn('subcategory', F.regexp_replace('subcategory', r'^[0]*', ''))
df = df.withColumn('subcategory_label', F.regexp_replace('subcategory_label', r'^[0]*', ''))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
7315 次 |
| 最近记录: |