小编use*_*663的帖子

通过Spark写入HBase:任务不可序列化

我正在尝试使用Spark 1.0在HBase(0.96.0-hadoop2)中编写一些简单数据,但我不断遇到序列化问题.这是相关代码:

import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.util.Bytes
import org.apache.spark.rdd.NewHadoopRDD
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.mapred.JobConf
import org.apache.spark.SparkContext
import java.util.Properties
import java.io.FileInputStream
import org.apache.hadoop.hbase.client.Put

object PutRawDataIntoHbase{
  def main(args: Array[String]): Unit = {
    var propFileName = "hbaseConfig.properties"
    if(args.size > 0){
      propFileName = args(0)
    }

    /** Load properties here **/
   val theData = sc.textFile(prop.getProperty("hbase.input.filename"))
     .map(l => l.split("\t"))
     .map(a => Array("%010d".format(a(9).toInt)+ "-" + a(0) , a(1)))

   val tableName = prop.getProperty("hbase.table.name")
   val hbaseConf = HBaseConfiguration.create()
   hbaseConf.set("hbase.rootdir", prop.getProperty("hbase.rootdir"))
   hbaseConf.addResource(prop.getProperty("hbase.site.xml"))
   val myTable = new HTable(hbaseConf, tableName)
   theData.foreach(a=>{
     var …
Run Code Online (Sandbox Code Playgroud)

hbase scala apache-spark

7
推荐指数
1
解决办法
9559
查看次数

标签 统计

apache-spark ×1

hbase ×1

scala ×1