如何从Hdfs读取CSV文件?

use*_*360 2 csv hadoop hdfs mahout

我的数据在CSV文件中.我想读取HDFS中的CSV文件.

任何人都可以帮我代码?

我是hadoop的新手.提前致谢.

Tar*_*riq 5

这需要的类是FileSystem,FSDataInputStreamPath.客户应该是这样的:

public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream inputStream = fs.open(new Path("/path/to/input/file"));
        System.out.println(inputStream.readChar());         
    }
Run Code Online (Sandbox Code Playgroud)

FSDataInputStream有几种read方法.选择一个适合您需求的产品.

如果是MR,那就更容易了:

        public static class YourMapper extends
                    Mapper<LongWritable, Text, Your_Wish, Your_Wish> {

                public void map(LongWritable key, Text value, Context context)
                        throws IOException, InterruptedException {

                    //Framework does the reading for you...
                    String line = value.toString();      //line contains one line of your csv file.
                    //do your processing here
                    ....................
                    ....................
                    context.write(Your_Wish, Your_Wish);
                    }
                }
            }
Run Code Online (Sandbox Code Playgroud)