ben*_*ben 4 python apache-spark rdd pyspark
我想在RDD中删除几行空行.我该怎么做?
我尝试了以下但它不起作用.我仍然得到空行
json_cp_rdd = xform_rdd.map(lambda (key, value): get_cp_json_with_planid(key, value)).filter(
lambda x: x is not None).filter(
lambda x: x is not '')
Run Code Online (Sandbox Code Playgroud)
[你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你) '',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你,你'',你',你',你',你',你',你',你',你',你',你',你',你''' ,u',u',u',u',u',u',u',u'[{"PLAN_ID":"d2031aed-175f-4346-af31-9d05bfd4ea3a"," CostTotalInvEOPAmount":0.0,"St oreCount":0,"WeekEndingData":"2017-07-08","UnitTotalInvBOPQuantity":0.0,"PriceStatus":1,"UnitOnOrderQuantity":null,"CostTotalInvBOPAmount":0.0,"RetailSalesAmount":0.0,"UnitCostAmount" :0.0,"CostReceiptAmount":0.0,"CostSalesAmount":0.0,"UnitSalesQuantity":0.0,"UnitReceiptQuantity":0.0,"UnitTotalInvEOPQuantity":0.0,"CostOnOrderAmount":null}]',u',u'',你,'你',你',你',你',你',你'']
use*_*411 11
is检查对象标识不平等.在Python 2.x中你可以使用!=
.filter(lambda x: x is not None).filter(lambda x: x != "")
Run Code Online (Sandbox Code Playgroud)
但是惯用你只能使用一个filter带有身份的单身:
.filter(lambda x: x)
Run Code Online (Sandbox Code Playgroud)
或直接与bool:
.filter(bool)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
10642 次 |
| 最近记录: |