sri*_*mar 0 apache-spark apache-spark-sql
我需要以不区分大小写的方式对行进行排序。
我有这样的数据:
+---+---------------+--------------------+--------------------+------+--------------+
| id| full_name| job_title| email|gender| ip_address|
+---+---------------+--------------------+--------------------+------+--------------+
| 73| Tina Mccoy|Desktop Support T...|tmccoy20@techcrun...|Female| 23.196.170.54|
| 74| Lois Hart| Food Chemist|lhart21@mapquest.com|Female| 145.52.30.236|
| 75| Thomas Hall| Senior Developer| thall22@wired.com| Male|76.255.197.231|
| 76| Ernest Romero| Teacher|eromero23@amazon....| Male| 99.21.57.239|
| 77| Irene Bradley| Assistant Professor|ibradley24@squido...|Female| 16.51.179.230|
| 78|Jacqueline Cruz|account Represent...| jcruz25@cdc.gov|Female| 167.49.98.213|
| 79| Sara Martin| Geologist IV| smartin26@a8.net|Female| 10.145.49.204|
| 80| Johnny Bradley| Executive Secretary|jbradley27@cocolo...| Male| 138.251.4.102|
| 81| Fred Dean|Nuclear Power Eng...|fdean28@kickstart...| Male| 173.10.122.12|
| 82| Ralph Greene| Senior Editor|rgreene29@omnitur...| Male| 57.230.33.105|
+---+---------------+--------------------+--------------------+------+--------------+
Run Code Online (Sandbox Code Playgroud)
当我job_title使用df.orderBy('job_title'). 这就是我得到的。
+---+---------------+--------------------+--------------------+------+--------------+
| id| full_name| job_title| email|gender| ip_address|
+---+---------------+--------------------+--------------------+------+--------------+
| 77| Irene Bradley| Assistant Professor|ibradley24@squido...|Female| 16.51.179.230|
| 73| Tina Mccoy|Desktop Support T...|tmccoy20@techcrun...|Female| 23.196.170.54|
| 80| Johnny Bradley| Executive Secretary|jbradley27@cocolo...| Male| 138.251.4.102|
| 74| Lois Hart| Food Chemist|lhart21@mapquest.com|Female| 145.52.30.236|
| 79| Sara Martin| Geologist IV| smartin26@a8.net|Female| 10.145.49.204|
| 81| Fred Dean|Nuclear Power Eng...|fdean28@kickstart...| Male| 173.10.122.12|
| 75| Thomas Hall| Senior Developer| thall22@wired.com| Male|76.255.197.231|
| 82| Ralph Greene| Senior Editor|rgreene29@omnitur...| Male| 57.230.33.105|
| 76| Ernest Romero| Teacher|eromero23@amazon....| Male| 99.21.57.239|
| 78|Jacqueline Cruz|account Represent...| jcruz25@cdc.gov|Female| 167.49.98.213|
+---+---------------+--------------------+--------------------+------+--------------+
Run Code Online (Sandbox Code Playgroud)
但我需要的是
+---+---------------+--------------------+--------------------+------+--------------+
| id| full_name| job_title| email|gender| ip_address|
+---+---------------+--------------------+--------------------+------+--------------+
| 78|Jacqueline Cruz|account Represent...| jcruz25@cdc.gov|Female| 167.49.98.213|
| 77| Irene Bradley| Assistant Professor|ibradley24@squido...|Female| 16.51.179.230|
| 73| Tina Mccoy|Desktop Support T...|tmccoy20@techcrun...|Female| 23.196.170.54|
| 80| Johnny Bradley| Executive Secretary|jbradley27@cocolo...| Male| 138.251.4.102|
| 74| Lois Hart| Food Chemist|lhart21@mapquest.com|Female| 145.52.30.236|
| 79| Sara Martin| Geologist IV| smartin26@a8.net|Female| 10.145.49.204|
| 81| Fred Dean|Nuclear Power Eng...|fdean28@kickstart...| Male| 173.10.122.12|
| 75| Thomas Hall| Senior Developer| thall22@wired.com| Male|76.255.197.231|
| 82| Ralph Greene| Senior Editor|rgreene29@omnitur...| Male| 57.230.33.105|
| 76| Ernest Romero| Teacher|eromero23@amazon....| Male| 99.21.57.239|
+---+---------------+--------------------+--------------------+------+--------------+
Run Code Online (Sandbox Code Playgroud)
可以将计算表达式作为参数传递给orderBy. 所以你可以导入lower函数:
from pyspark.sql.functions import col, lower
Run Code Online (Sandbox Code Playgroud)
并用它来包装列名称
df.orderBy(lower(col("job_title")))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
933 次 |
| 最近记录: |