Abu*_*mad 4 java file-io text-files
我有两个文件:
1- 1400,000行或记录--- 14 MB
2- 16000000 - 170 MB
我想查找文件1中的每个记录或行是否也在文件2中
我开发了一个执行以下操作的Java应用程序:逐行读取文件并将每行传递给循环在文件2中的方法
这是我的代码:
public boolean hasIDin(String bioid) throws Exception {
BufferedReader br = new BufferedReader(new FileReader("C://AllIDs.txt"));
long bid = Long.parseLong(bioid);
String thisLine;
while((thisLine = br.readLine( )) != null)
{
if (Long.parseLong(thisLine) == bid)
return true;
}
return false;
}
public void getMBD() throws Exception{
BufferedReader br = new BufferedReader(new FileReader("C://DIDs.txt"));
OutputStream os = new FileOutputStream("C://MBD.txt");
PrintWriter pr = new PrintWriter(os);
String thisLine;
int count=1;
while ((thisLine = br.readLine( )) != null){
String bioid = thisLine;
System.out.println(count);
if(! hasIDin(bioid))
pr.println(bioid);
count++;
}
pr.close();
}
Run Code Online (Sandbox Code Playgroud)
当我运行它似乎需要更多1944.44444444444小时才能完成,因为每行处理需要5秒.大约三个月!
是否有任何想法可以在更短的时间内完成.
提前致谢.
你为什么不;
这是一个调整的实现,它打印以下内容并使用<64 MB.
Generating 1400000 ids to /tmp/DID.txt
Generating 16000000 ids to /tmp/AllIDs.txt
Reading ids in /tmp/DID.txt
Reading ids in /tmp/AllIDs.txt
Took 8794 ms to find 294330 valid ids
Run Code Online (Sandbox Code Playgroud)
码
public static void main(String... args) throws IOException {
generateFile("/tmp/DID.txt", 1400000);
generateFile("/tmp/AllIDs.txt", 16000000);
long start = System.currentTimeMillis();
TLongHashSet did = readLongs("/tmp/DID.txt");
TLongHashSet validIDS = readLongsUnion("/tmp/AllIDs.txt",did);
long time = System.currentTimeMillis() - start;
System.out.println("Took "+ time+" ms to find "+ validIDS.size()+" valid ids");
}
private static TLongHashSet readLongs(String filename) throws IOException {
System.out.println("Reading ids in "+filename);
BufferedReader br = new BufferedReader(new FileReader(filename), 128*1024);
TLongHashSet ids = new TLongHashSet();
for(String line; (line = br.readLine())!=null;)
ids.add(Long.parseLong(line));
br.close();
return ids;
}
private static TLongHashSet readLongsUnion(String filename, TLongHashSet validSet) throws IOException {
System.out.println("Reading ids in "+filename);
BufferedReader br = new BufferedReader(new FileReader(filename), 128*1024);
TLongHashSet ids = new TLongHashSet();
for(String line; (line = br.readLine())!=null;) {
long val = Long.parseLong(line);
if (validSet.contains(val))
ids.add(val);
}
br.close();
return ids;
}
private static void generateFile(String filename, int number) throws IOException {
System.out.println("Generating "+number+" ids to "+filename);
PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter(filename), 128*1024));
Random rand = new Random();
for(int i=0;i<number;i++)
pw.println(rand.nextInt(1<<26));
pw.close();
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
905 次 |
| 最近记录: |