Nav*_*nth 2 python hadoop mapreduce apache-spark pyspark
我的数据框如下所示
ID,FirstName,LastName
1,Navee,Srikanth
2,,Srikanth
3,Naveen,
Run Code Online (Sandbox Code Playgroud)
现在我的问题陈述是,由于名字为空,我必须删除第2行。
我正在使用以下pyspark脚本
join_Df1= Name.filter(Name.col(FirstName).isnotnull()).show()
Run Code Online (Sandbox Code Playgroud)
我收到错误消息
File "D:\0\NameValidation.py", line 13, in <module>
join_Df1= filter(Name.FirstName.isnotnull()).show()
Run Code Online (Sandbox Code Playgroud)
TypeError:“列”对象不可调用
谁能帮我解决这个问题
看来您的DataFrame FirstName却具有空值Null。以下是一些可以尝试的选项:-
df = sqlContext.createDataFrame([[1,'Navee','Srikanth'], [2,'','Srikanth'] , [3,'Naveen','']], ['ID','FirstName','LastName'])
df.show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 2| |Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where(df.FirstName.isNotNull()).show() #This doen't remove null because df have empty value
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 2| |Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where(df.FirstName != '').show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.filter(df.FirstName != '').show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+
df.where("FirstName != ''").show()
+---+---------+--------+
| ID|FirstName|LastName|
+---+---------+--------+
| 1| Navee|Srikanth|
| 3| Naveen| |
+---+---------+--------+
Run Code Online (Sandbox Code Playgroud)
你应该做如下
join_Df1.filter(join_Df1.FirstName.isNotNull()).show
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助!
| 归档时间: |
|
| 查看次数: |
16632 次 |
| 最近记录: |