Sop*_*hie 6 language-agnostic sorting algorithm file lines
什么是一个很好的算法来排序大于可用内存(许多10千兆字节)并包含可变长度记录的文本文件?我见过的所有算法都假设1)数据适合内存,或者2)记录是固定长度的.但想象一下我想按"BirthDate"字段(第4个字段)排序的大型CSV文件:
Id,UserId,Name,BirthDate
1,psmith,"Peter Smith","1984/01/01"
2,dmehta,"Divya Mehta","1985/11/23"
3,scohen,"Saul Cohen","1984/08/19"
...
99999999,swright,"Shaun Wright","1986/04/12"
100000000,amarkov,"Anya Markov","1984/10/31"
Run Code Online (Sandbox Code Playgroud)
我知道:
谢谢!♥