Java:使用缓冲输入从随机访问文件中读取字符串

Gre*_*Cat 7 java nio random-access fileinputstream bufferedreader

我以前从未接触过Java IO API的经验,现在我真的很沮丧.我发现很难相信它有多奇怪和复杂,做一个简单的任务有多难.

我的任务:我有2个位置(起始字节,结束字节),pos1pos2.我需要读取这两个字节之间的行(包括起始字节,不包括结尾字节),并将它们用作UTF8字符串对象.

例如,在大多数脚本语言中,它将是一个非常简单的1-2-3-liner(在Ruby中,但它对于Python,Perl等基本相同):

f = File.open("file.txt").seek(pos1)
while f.pos < pos2 {
  s = f.readline
  # do something with "s" here
}
Run Code Online (Sandbox Code Playgroud)

使用Java IO API很快就会出现问题;)实际上,我看到了两种\n从常规本地文件中读取行(以...结尾)的方法:

  • RandomAccessFilegetFilePointer()seek(long pos),但它的readLine()读取非UTF8字符串(甚至不是字节数组),但非常奇怪的字符串具有破坏的编码,并且它没有缓冲(这可能意味着每个read*()调用都将被转换为单个不连续的OS read()= >相当慢).
  • BufferedReader有很好的readLine()方法,它甚至可以进行一些搜索skip(long n),但它无法确定已经读取的偶数字节数,也没有提到文件中的当前位置.

我试过用类似的东西:

    FileInputStream fis = new FileInputStream(fileName);
    FileChannel fc = fis.getChannel();
    BufferedReader br = new BufferedReader(
            new InputStreamReader(
                    fis,
                    CHARSET_UTF8
            )
    );
Run Code Online (Sandbox Code Playgroud)

...然后使用fc.position()获取当前文件读取位置并fc.position(newPosition)设置一个,但它似乎在我的情况下不起作用:看起来它返回由BufferedReader完成的缓冲区预填充的位置,或类似的东西 - 这些计数器似乎以16K为增量进行四舍五入.

我是否真的必须自己实现它,即文件读取器接口,它将:

  • 允许我在文件中获取/设置位置
  • 缓冲文件读取操作
  • 允许读取UTF8字符串(或者至少允许"读取所有内容直到下一个\n"等操作)

有没有比自己实施更快的方法?我在监督什么吗?

Ken*_*oom 6

import org.apache.commons.io.input.BoundedInputStream

FileInputStream file = new FileInputStream(filename);
file.skip(pos1);
BufferedReader br = new BufferedReader(
   new InputStreamReader(new BoundedInputStream(file,pos2-pos1))
);
Run Code Online (Sandbox Code Playgroud)

如果你不关心pos2,那么你不需要Apache Commons IO.


scu*_*ube 6

我写了这段代码,用randomaccessfiles读取utf-8

//File: CyclicBuffer.java
public class CyclicBuffer {
private static final int size = 3;
private FileChannel channel;
private ByteBuffer buffer = ByteBuffer.allocate(size);

public CyclicBuffer(FileChannel channel) {
    this.channel = channel;
}

private int read() throws IOException {
    return channel.read(buffer);
}

/**
 * Returns the byte read
 *
 * @return byte read -1 - end of file reached
 * @throws IOException
 */
public byte get() throws IOException {
    if (buffer.hasRemaining()) {
        return buffer.get();
    } else {
        buffer.clear();
        int eof = read();
        if (eof == -1) {
            return (byte) eof;
        }
        buffer.flip();
        return buffer.get();
    }
}
}
//File: UTFRandomFileLineReader.java


public class UTFRandomFileLineReader {
private final Charset charset = Charset.forName("utf-8");
private CyclicBuffer buffer;
private ByteBuffer temp = ByteBuffer.allocate(4096);
private boolean eof = false;

public UTFRandomFileLineReader(FileChannel channel) {
    this.buffer = new CyclicBuffer(channel);
}

public String readLine() throws IOException {
    if (eof) {
        return null;
    }
    byte x = 0;
    temp.clear();

    while ((byte) -1 != (x = (buffer.get())) &amp;&amp; x != '\n') {
        if (temp.position() == temp.capacity()) {
            temp = addCapacity(temp);
        }
        temp.put(x);
    }
    if (x == -1) {
        eof = true;
    }
    temp.flip();
    if (temp.hasRemaining()) {
        return charset.decode(temp).toString();
    } else {
        return null;
    }
}

private ByteBuffer addCapacity(ByteBuffer temp) {
    ByteBuffer t = ByteBuffer.allocate(temp.capacity() + 1024);
    temp.flip();
    t.put(temp);
    return t;
}

public static void main(String[] args) throws IOException {
    RandomAccessFile file = new RandomAccessFile("/Users/sachins/utf8.txt",
            "r");
    UTFRandomFileLineReader reader = new UTFRandomFileLineReader(file
            .getChannel());
    int i = 1;
    while (true) {
        String s = reader.readLine();
        if (s == null)
            break;
        System.out.println("\n line  " + i++);
        s = s + "\n";
        for (byte b : s.getBytes(Charset.forName("utf-8"))) {
            System.out.printf("%x", b);
        }
        System.out.printf("\n");

    }
}
}
Run Code Online (Sandbox Code Playgroud)