Jer*_*rge 3 python indexing apache-spark rdd pyspark
This is my first question. I am coding in Pyspark. I have and RDD:
['a,b,c,d,e,f']
Run Code Online (Sandbox Code Playgroud)
How do I find the index of the element 'e'?
I tried zipWithIndex but its not giving me any index.
I saw a similar question, but the solution mentioned did not return me the index
rdd.zipWithIndex().filter(lambda key,index : key == 'e') \
.map(lambda key,index : index).collect()
Run Code Online (Sandbox Code Playgroud)
I am getting an error.
Please let me know how to find the index.
Based on the solution provided:
I still have a problem. My rdd is in this format:
['a,b,c,d,e,f']
Run Code Online (Sandbox Code Playgroud)
So when I try :
rdd.zipWithIndex().lookup('e')
I get [ ]
How should I proceed
Thanks
因为无论你得到一个异常map
,并filter
期待一个参数的函数:
rdd = sc.parallelize(['a', 'b', 'c', 'd', 'e', 'f'])
(rdd
.zipWithIndex()
.filter(lambda ki: ki[0] == 'e')
.map(lambda ki : ki[1]))
# [4]
Run Code Online (Sandbox Code Playgroud)
在史前的Python版本中,元组解压缩也可以正常工作:
(rdd
.zipWithIndex()
.filter(lambda (key, index): key == 'e')
.map(lambda (key, index): index))
Run Code Online (Sandbox Code Playgroud)
但我希望您不要使用任何这些。
我个人只会使用 lookup
rdd.zipWithIndex().lookup('e')
# [4]
Run Code Online (Sandbox Code Playgroud)
另外-请记住,RDD中的值顺序可能不确定。
归档时间: |
|
查看次数: |
2298 次 |
最近记录: |