waz*_*zza 1 hbase apache-spark-sql
我在hbase中有一个名为"sample"的表.我需要使用Apache spark-sql查询查询表.有没有办法使用Apache spark-sql查询读取hbase数据?
Spark SQL是一个内存中的查询引擎,用于在您需要的HBase表之上使用Spark SQL执行某些查询操作
使用Spark从HBase获取数据并创建Spark RDD
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("SparkApp");
sparkConf.setMaster("local[*]");
JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf);
Configuration config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/hbase-site.xml"));
config.addResource(new Path("/etc/hadoop/core-site.xml"));
config.set(TableInputFormat.INPUT_TABLE, "sample");
JavaPairRDD<ImmutableBytesWritable, Result> hbaseRDD = javaSparkContext.newAPIHadoopRDD(hbaseConfig, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
JavaRDD<StudentBean> sampleRDD = hbaseRDD.map(new Function<Tuple2<ImmutableBytesWritable,Result>, StudentBean  >() {
    private static final long serialVersionUID = -2021713021648730786L;
    public StudentBean  call(Tuple2<ImmutableBytesWritable, Result> tuple) {
        StudentBean  bean = new StudentBean  ();
        Result result = tuple._2;
        bean.setRowKey(rowKey);
        bean.setFirstName(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("firstName"))));
        bean.setLastName(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("lastName"))));
        bean.setBranch(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("branch"))));
        bean.setEmailId(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("emailId"))));
        return bean;
    }
});
使用此RDD创建DataFrame对象并将其注册到一些临时表名称,然后您可以执行查询
DataFrame schema = sqlContext.createDataFrame(sampleRDD, StudentBean.class);
schema.registerTempTable("spark_sql_temp_table");
DataFrame schemaRDD = sqlContext.sql("YOUR_QUERY_GOES_HERE");
JavaRDD<StudentBean> result = schemaRDD.toJavaRDD().map(new Function<Row, StudentBean>() {
    private static final long serialVersionUID = -2558736294883522519L;
    public StudentBean call(Row row) throws Exception {
        StudentBean bean = new StudentBean();
        //  Do the mapping stuff here
        return bean;
    }
});
| 归档时间: | 
 | 
| 查看次数: | 3773 次 | 
| 最近记录: |