cjr*_*oth 6 java memory algorithm intersection set
我有一个有趣的问题.我想在Java中交叉两组Longs,每组有1B成员 - 每组4GB.这将不适合我需要运行它的服务器上的内存.
我想知道有什么有趣的方法来解决这个问题.
What I've come up with so far is reading subsets of each original set from disk that are small enough to fit into memory, then intersecting each subset, and writing those to disk temporarily. Finally, I could go through and intersect these subsets. I get a feeling that this may turn into a map reduce job.
Maybe you'll have some better ideas :) I doubt I'm the first person to have come up with this problem.
排序两组A和B分别.
从集合A中取出并移除第一个元素,从集合B中取出第一个元素
如果它们相等,则添加到结果集.
如果来自一个集合的项目更大,请从第二个集合中获取下一个项目.
只要您没有达到任何一组的结尾,请转到2.
这种方法的优点是你永远不会在内存中保留2个以上的长度(排序除外).可以在磁盘上有效地进行排序(合并排序).