我想将1GB(1000万条记录)CSV文件加载到Hbase中.我为它写了Map-Reduce程序.我的代码工作正常但需要1小时才能完成.最后一个减速机需要超过半小时的时间.有人可以帮帮我吗?
我的守则如下:
Driver.Java
package com.cloudera.examples.hbase.bulkimport;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
* HBase bulk import example
* Data preparation MapReduce job driver
*
* - args[0]: HDFS input path
*
- args[1]: HDFS output path
*
- args[2]: HBase table name
*
*/
public class Driver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
/*
* NBA Final 2010 game … 我想将文件拆分成多个文件.我的意见是
Report : 1
ABC
DEF
GHI
JKL
End of Report
$
Report : 2
ABC
DEF
GHI
JKL
$
Report : 2
ABC
DEF
GHI
JKL
End of Report
$
Report : 3
ABC
DEF
GHI
JKL
End of Report
$
Run Code Online (Sandbox Code Playgroud)
输出应该是:
档案1
Report : 1
ABC
DEF
GHI
JKL
End of Report
$
Run Code Online (Sandbox Code Playgroud)
档案2
Report : 2
ABC
DEF
GHI
JKL
$
Report : 2
ABC
DEF
GHI
JKL
End of Report
$
Run Code Online (Sandbox Code Playgroud)
档案3
Report : 3
ABC …Run Code Online (Sandbox Code Playgroud)