我在hbase中有一个大表,名称是UserAction,它有三个列族(歌曲,专辑,歌手).我需要从'song'列族中获取所有数据作为JavaRDD对象.我试试这段代码,但效率不高.有没有更好的解决方案呢?
static SparkConf sparkConf = new SparkConf().setAppName("test").setMaster(
"local[4]");
static JavaSparkContext jsc = new JavaSparkContext(sparkConf);
static void getRatings() {
Configuration conf = HBaseConfiguration.create();
conf.set(TableInputFormat.INPUT_TABLE, "UserAction");
conf.set(TableInputFormat.SCAN_COLUMN_FAMILY, "song");
JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = jsc
.newAPIHadoopRDD(
conf,
TableInputFormat.class,
org.apache.hadoop.hbase.io.ImmutableBytesWritable.class,
org.apache.hadoop.hbase.client.Result.class);
JavaRDD<Rating> count = hBaseRDD
.map(new Function<Tuple2<ImmutableBytesWritable, Result>, JavaRDD<Rating>>() {
@Override
public JavaRDD<Rating> call(
Tuple2<ImmutableBytesWritable, Result> t)
throws Exception {
Result r = t._2;
int user = Integer.parseInt(Bytes.toString(r.getRow()));
ArrayList<Rating> ra = new ArrayList<>();
for (Cell c : r.rawCells()) {
int product = Integer.parseInt(Bytes
.toString(CellUtil.cloneQualifier(c)));
double …Run Code Online (Sandbox Code Playgroud) 我提取了一个 jar 文件并对其进行了一些更改。现在,我如何将它再次压缩为可执行的 jar 文件?
谢谢。(对不起,我的英语不好。)