Spark Scala 流式 CSV

Lev*_*evi 4 csv scala apache-spark spark-streaming

我是 Spark/Scala 的新手。我知道如何加载 CSV 文件:

    sqlContext.read.format("csv")
Run Code Online (Sandbox Code Playgroud)

以及如何读取文本流和文件流:

    scc.textFileStream("""file:///c:\path\filename""");
    scc.fileStream[LongWritable, Text, TextInputFormat](...)
Run Code Online (Sandbox Code Playgroud)

但是如何读取CSV格式的文本?谢谢,列维

Sud*_*yam 6

干得好:

val ssc = new StreamingContext(sparkConf, Seconds(5))


    // Create the FileInputDStream on the directory
    val lines = ssc.textFileStream("file:///C:/foo/bar")

    lines.foreachRDD(rdd => {
        if (!rdd.isEmpty()) {
          println("RDD row count: " + rdd.count())
         // Now you can convert this RDD to DataFrame/DataSet and perform business logic.  

        }
      }
    })

    ssc.start()
    ssc.awaitTermination()
  } 
Run Code Online (Sandbox Code Playgroud)