相关疑难解决方法(0)

从Spark中的压缩中读取整个文本文件

我有以下问题:假设我有一个包含压缩目录的目录,其中包含存储在HDFS上的多个文件.我想创建一个包含T类型对象的RDD,即:

context = new JavaSparkContext(conf);
JavaPairRDD<String, String> filesRDD = context.wholeTextFiles(inputDataPath);

JavaPairRDD<String, String> filesRDD = context.wholeTextFiles(inputDataPath);
JavaRDD<T> processingFiles = filesRDD.map(fileNameContent -> {
    // The name of the file
    String fileName = fileNameContent._1();
    // The content of the file
    String content = fileNameContent._2();

    // Class T has a constructor of taking the filename and the content of each
    // processed file (as two strings)
    T t = new T(content, fileName);

    return t;
});

Run Code Online (Sandbox Code Playgroud)

现在什么时候inputDataPath是一个包含文件的目录,这完全正常,即它是这样的:

String inputDataPath =  "hdfs://some_path/*/*/"; // because …

Run Code Online (Sandbox Code Playgroud)

java compression hadoop hdfs apache-spark

Bel*_*gor

2017 05-23

10
推荐指数

1
解决办法

9486
查看次数

如何解压缩字节数组中的gzip压缩数据？

我有一个类,它有一个接收对象作为参数的方法.通过RMI调用此方法.

public RMIClass extends Serializable {
    public RMIMethod(MyFile file){
        // do stuff
    }
}

Run Code Online (Sandbox Code Playgroud)

MyFile有一个名为"body"的属性,它是一个字节数组.

public final class MyFile implements Serializable {

    private byte[] body = new byte[0];
    //.... 

    public byte[] getBody() {
        return body;
    }
    //....
}

Run Code Online (Sandbox Code Playgroud)

此属性包含由另一个应用程序解析的文件的gzip压缩数据.

在执行进一步的操作之前,我需要解压缩此字节数组.

我看到的解压缩gzip压缩数据的所有例子都假设我想将它写入磁盘并创建一个物理文件,我不这样做.

我该怎么做呢？

提前致谢.

java io gzipinputstream

rsh*_*erd

2013 02-02

6
推荐指数

2
解决办法

6503
查看次数