如何在Java中的8GB平面文件中的无序列表中查找名称

Question

如何在Java中的8GB平面文件中的无序列表中查找名称

sun*_*dev 0 java io performance

好吧,所以我们有这个问题,我知道我可以使用InputStream来读取流而不是读取整个文件,因为这会导致内存问题.

参考这个答案:https://stackoverflow.com/a/14037510/1316967

然而,关注的是速度,在这种情况下,我会读取整个文件的每一行.考虑到这个文件包含数百万个无序的名称,这个操作必须在几秒钟内完成,我该如何解决这个问题.

Answer 1

Ger*_*cke 5

因为列表是无序的,所以除了读取整个文件之外别无选择.

如果幸运的话,名字是你要找的名字:o(1).

如果你运气不好,那就是姓:O(n).

除此之外,如果你以java.io方式(Files.newBufferedReader())或java.nio方式(Files.newByteChannel())来做并不重要,它们 - 或多或少 - 都执行相同的操作.如果输入文件是基于行的(如您的情况),您可以使用

Files.lines().filter(l -> name.equals(l)).findFirst();

Run Code Online (Sandbox Code Playgroud)

它在内部使用BufferedReader.

如果你真的不想加快速度,你必须对文件中的名称进行排序(请参阅如何对非常大的文件进行排序),现在您可以从

编辑:使用索引的有序列表

获得有序列表后,可以使用a快速扫描并创建索引TreeMap,然后向右跳转以更正文件位置(使用RandomAccessFile或SeekableByteChannel)并读取名称.

例如:

long blockSize = 1048576L;
Path file = Paths.get("yourFile");

long fileSize = Files.size(file);
RandomAccessFile raf = new RandomAccessFile(file.toFile(), "r");

//create the index
TreeMap<String, Long> index = new TreeMap<>();
for(long pos = 0; pos < fileSize; pos += blockSize) {
     //jump the next block
     raf.seek(pos);
     index.put(raf.readLine(), pos);
 }

 //get the position of a name
 String name = "someName";

 //get the beginning and end of the block
 long offset = Optional.ofNullable(index.lowerEntry(name)).map(Map.Entry::getValue).orElse(0L);
 long limit = Optional.ofNullable(index.ceilingEntry(name)).map(Map.Entry::getValue).orElse(fileSize);

 //move the pointer to the offset position
 raf.seek(offset);
 long cur;
 while((cur = raf.getFilePointer())  < limit){
      if(name.equals(raf.readLine())) {
          return cur;
      }
 }

Run Code Online (Sandbox Code Playgroud)

块大小是索引大小,索引创建时间和数据访问时间之间的权衡.块越大,索引和索引创建时间越小,但数据访问时间越长.

归档时间：	8 年，7 月前
查看次数：	79 次
最近记录：	8 年，7 月前