AHA*_*HAD 3 text scala apache-spark
假设这些是我的数据:
‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.
‘Map’ is responsible to read data from input location.
it will generate a key value pair.
that is, an intermediate output in local machine.
’Reducer’ is responsible to process the intermediate.
output received from the mapper and generate the final output.
Run Code Online (Sandbox Code Playgroud)
我想在每一行添加一个数字,如下面的输出:
1,‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.
2,‘Map’ is responsible to read data from input location.
3,it will generate a key value pair.
4,that is, an intermediate output in local machine.
5,’Reducer’ is responsible to process the intermediate.
6,output received from the mapper and generate the final output.
Run Code Online (Sandbox Code Playgroud)
将它们保存到文件中.
我试过了:
object DS_E5 {
def main(args: Array[String]): Unit = {
var i=0
val conf = new SparkConf().setAppName("prep").setMaster("local")
val sc = new SparkContext(conf)
val sample1 = sc.textFile("data.txt")
for(sample<-sample1){
i=i+1
val ss=sample.map(l=>(i,sample))
println(ss)
}
}
}
Run Code Online (Sandbox Code Playgroud)
但它的输出就像吹:
Vector((1,‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.))
...
Run Code Online (Sandbox Code Playgroud)
如何编辑我的代码以生成像我最喜欢的输出的输出?
zipWithIndex这就是你需要的.它通过在对的第二个位置添加索引RDD[T]来 映射RDD[(T, Long)].
sample1
.zipWithIndex()
.map { case (line, i) => i.toString + ", " + line }
Run Code Online (Sandbox Code Playgroud)
或使用字符串插值(请参阅@ DanielC.Sobral的评论)
sample1
.zipWithIndex()
.map { case (line, i) => s"$i, $line" }
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5981 次 |
| 最近记录: |