如何从Pyspark RDD中删除空行

ben*_*ben 4 python apache-spark rdd pyspark

我想在RDD中删除几行空行.我该怎么做?

我尝试了以下但它不起作用.我仍然得到空行

json_cp_rdd = xform_rdd.map(lambda (key, value): get_cp_json_with_planid(key, value)).filter(
            lambda x: x is not None).filter(
            lambda x: x is not '')
Run Code Online (Sandbox Code Playgroud)

[你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你) '',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,u',u',u',u',u',u',u',u'[{"PLAN_ID":"d2031aed-175f-4346-af31-9d05bfd4ea3a"," CostTotalInvEOPAmount":0.0,"St oreCount":0,"WeekEndingData":"2017-07-08","UnitTotalInvBOPQuantity":0.0,"PriceStatus":1,"UnitOnOrderQuantity":null,"CostTotalInvBOPAmount":0.0,"RetailSalesAmount":0.0,"UnitCostAmount" :0.0,"CostReceiptAmount":0.0,"CostSalesAmount":0.0,"UnitSalesQuantity":0.0,"UnitReceiptQuantity":0.0,"UnitTotalInvEOPQuantity":0.0,"CostOnOrderAmount":null}]',u',u'',你,'你',你',你',你',你',你'']

use*_*411 11

is检查对象标识不平等.在Python 2.x中你可以使用!=

.filter(lambda x: x is not None).filter(lambda x: x != "")
Run Code Online (Sandbox Code Playgroud)

但是惯用你只能使用一个filter带有身份的单身:

.filter(lambda x: x)
Run Code Online (Sandbox Code Playgroud)

或直接与bool:

.filter(bool)
Run Code Online (Sandbox Code Playgroud)


ben*_*ben 4

替换filter(lambda x: x is not '')filter(lambda x: x is not u'')并且成功了