这可能是一个非常基本的问题,因为我是 pyspark 的初学者。我已经阅读了一个 csv 文件并尝试在其上应用一些 pyspark 功能,例如过滤、拆分或替换。但我面临一个错误这是我的代码...
emp_data = spark\
.read\
.format('csv')\
.option("inferSchema","true")\
.option("header","true")\
.load("/FileStore/tables/employee_earnings_report_2016-1.csv")
Run Code Online (Sandbox Code Playgroud)
阅读文件后,我应用了过滤器..运行良好
import pyspark.sql.functions as f
df = emp_data.filter((f.col("POSTAL") == 2148) | (f.col("POSTAL") == 2125)).show(5)
+-----------------+-----------+-----+-------+----------+-------+------+-------------------------+--------------+------+------+
| NAME| REGULAR|RETRO| OTHER| OVERTIME|INJURED|DETAIL|QUINN/EDUCATION INCENTIVE|TOTAL EARNINGS|POSTAL|Gender|
+-----------------+-----------+-----+-------+----------+-------+------+-------------------------+--------------+------+------+
| Abbasi,Sophia| $18,249.83| NA| NA| NA| NA| NA| NA| $18,249.83| 2148| M|
|Abbruzzese,Angela| $5,000.90| NA| NA| NA| NA| NA| NA| $5,000.90| 2125| M|
| Abbruzzese,Donna| $621.90| NA| NA| NA| NA| NA| NA| $621.90| 2125| M|
| Abdelrahim,Maha| $1,181.60| NA| NA| NA| NA| …Run Code Online (Sandbox Code Playgroud)