Gre*_*Cat 7 java nio random-access fileinputstream bufferedreader
我以前从未接触过Java IO API的经验,现在我真的很沮丧.我发现很难相信它有多奇怪和复杂,做一个简单的任务有多难.
我的任务:我有2个位置(起始字节,结束字节),pos1和pos2.我需要读取这两个字节之间的行(包括起始字节,不包括结尾字节),并将它们用作UTF8字符串对象.
例如,在大多数脚本语言中,它将是一个非常简单的1-2-3-liner(在Ruby中,但它对于Python,Perl等基本相同):
f = File.open("file.txt").seek(pos1)
while f.pos < pos2 {
s = f.readline
# do something with "s" here
}
Run Code Online (Sandbox Code Playgroud)
使用Java IO API很快就会出现问题;)实际上,我看到了两种\n从常规本地文件中读取行(以...结尾)的方法:
getFilePointer()和seek(long pos),但它的readLine()读取非UTF8字符串(甚至不是字节数组),但非常奇怪的字符串具有破坏的编码,并且它没有缓冲(这可能意味着每个read*()调用都将被转换为单个不连续的OS read()= >相当慢).readLine()方法,它甚至可以进行一些搜索skip(long n),但它无法确定已经读取的偶数字节数,也没有提到文件中的当前位置.我试过用类似的东西:
FileInputStream fis = new FileInputStream(fileName);
FileChannel fc = fis.getChannel();
BufferedReader br = new BufferedReader(
new InputStreamReader(
fis,
CHARSET_UTF8
)
);
Run Code Online (Sandbox Code Playgroud)
...然后使用fc.position()获取当前文件读取位置并fc.position(newPosition)设置一个,但它似乎在我的情况下不起作用:看起来它返回由BufferedReader完成的缓冲区预填充的位置,或类似的东西 - 这些计数器似乎以16K为增量进行四舍五入.
我是否真的必须自己实现它,即文件读取器接口,它将:
\n"等操作)有没有比自己实施更快的方法?我在监督什么吗?
import org.apache.commons.io.input.BoundedInputStream
FileInputStream file = new FileInputStream(filename);
file.skip(pos1);
BufferedReader br = new BufferedReader(
new InputStreamReader(new BoundedInputStream(file,pos2-pos1))
);
Run Code Online (Sandbox Code Playgroud)
如果你不关心pos2,那么你不需要Apache Commons IO.
我写了这段代码,用randomaccessfiles读取utf-8
//File: CyclicBuffer.java
public class CyclicBuffer {
private static final int size = 3;
private FileChannel channel;
private ByteBuffer buffer = ByteBuffer.allocate(size);
public CyclicBuffer(FileChannel channel) {
this.channel = channel;
}
private int read() throws IOException {
return channel.read(buffer);
}
/**
* Returns the byte read
*
* @return byte read -1 - end of file reached
* @throws IOException
*/
public byte get() throws IOException {
if (buffer.hasRemaining()) {
return buffer.get();
} else {
buffer.clear();
int eof = read();
if (eof == -1) {
return (byte) eof;
}
buffer.flip();
return buffer.get();
}
}
}
//File: UTFRandomFileLineReader.java
public class UTFRandomFileLineReader {
private final Charset charset = Charset.forName("utf-8");
private CyclicBuffer buffer;
private ByteBuffer temp = ByteBuffer.allocate(4096);
private boolean eof = false;
public UTFRandomFileLineReader(FileChannel channel) {
this.buffer = new CyclicBuffer(channel);
}
public String readLine() throws IOException {
if (eof) {
return null;
}
byte x = 0;
temp.clear();
while ((byte) -1 != (x = (buffer.get())) && x != '\n') {
if (temp.position() == temp.capacity()) {
temp = addCapacity(temp);
}
temp.put(x);
}
if (x == -1) {
eof = true;
}
temp.flip();
if (temp.hasRemaining()) {
return charset.decode(temp).toString();
} else {
return null;
}
}
private ByteBuffer addCapacity(ByteBuffer temp) {
ByteBuffer t = ByteBuffer.allocate(temp.capacity() + 1024);
temp.flip();
t.put(temp);
return t;
}
public static void main(String[] args) throws IOException {
RandomAccessFile file = new RandomAccessFile("/Users/sachins/utf8.txt",
"r");
UTFRandomFileLineReader reader = new UTFRandomFileLineReader(file
.getChannel());
int i = 1;
while (true) {
String s = reader.readLine();
if (s == null)
break;
System.out.println("\n line " + i++);
s = s + "\n";
for (byte b : s.getBytes(Charset.forName("utf-8"))) {
System.out.printf("%x", b);
}
System.out.printf("\n");
}
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
21486 次 |
| 最近记录: |