Mee*_*ary 14 java sorting linked-list arraylist
我想维持一个List<Integer>大小<= 10 ^ 6 的有序.每次添加一个新元素时,我都会调用Collections.sort()方法对列表中的新元素进行排序.据我所知,ArrayList表现要好于LinkedList.但是由于我会sort()经常调用方法,所以我已经明白,linkedList在对列表进行排序时会表现得更好并且将是更好的选择ArrayList,因为没有像ArrayList(array作为底层数据结构使用)的情况那样移动元素.任何更有效的建议.
ass*_*ias 17
您可以Collections#binarySearch在排序列表上使用以查找正确的插入点.ArrayList可能比LinkedList表现更好,特别是对于大小的大小,但这很容易测试.
我运行了各种方法的微基准测试:在每次插入后使用排序或使用binarySearch插入正确的位置,使用ArrayList(AL)和LinkedList(LL).我还添加了Commons TreeList和番石榴的TreeMultiset.
结论
TreeMultiset,但严格来说它不是一个列表 - 下一个最好的选择是使用ArrayList+ binarySearch表现最佳的代码,供参考:
@Benchmark public ArrayList<Integer> binarySearchAL() {
ArrayList<Integer> list = new ArrayList<> ();
Random r = new Random();
for (int i = 0; i < n; i++) {
int num = r.nextInt();
int index = Collections.binarySearch(list, num);
if (index >= 0) list.add(index, num);
else list.add(-index - 1, num);
current = list.get(0); //O(1), to make sure the sort is not optimised away
}
return list;
}
Run Code Online (Sandbox Code Playgroud)
bitbucket上的完整代码.
完整的结果
"Benchmark"列包含被测方法的名称(baseLine只填充列表而不对其进行排序,其他方法具有显式名称:AL = ArrayList,LL = LinkedList,TL = Commons TreeList,treeMultiSet = guava),(n )是列表的大小,Score是以毫秒为单位的时间.
Benchmark (n) Mode Samples Score Error Units
c.a.p.SO28164665.baseLine 100 avgt 10 0.002 ± 0.000 ms/op
c.a.p.SO28164665.baseLine 1000 avgt 10 0.017 ± 0.001 ms/op
c.a.p.SO28164665.baseLine 5000 avgt 10 0.086 ± 0.002 ms/op
c.a.p.SO28164665.baseLine 10000 avgt 10 0.175 ± 0.007 ms/op
c.a.p.SO28164665.binarySearchAL 100 avgt 10 0.014 ± 0.001 ms/op
c.a.p.SO28164665.binarySearchAL 1000 avgt 10 0.226 ± 0.006 ms/op
c.a.p.SO28164665.binarySearchAL 5000 avgt 10 2.413 ± 0.125 ms/op
c.a.p.SO28164665.binarySearchAL 10000 avgt 10 8.478 ± 0.523 ms/op
c.a.p.SO28164665.binarySearchLL 100 avgt 10 0.031 ± 0.000 ms/op
c.a.p.SO28164665.binarySearchLL 1000 avgt 10 3.876 ± 0.100 ms/op
c.a.p.SO28164665.binarySearchLL 5000 avgt 10 263.717 ± 6.852 ms/op
c.a.p.SO28164665.binarySearchLL 10000 avgt 10 843.436 ± 33.265 ms/op
c.a.p.SO28164665.sortAL 100 avgt 10 0.051 ± 0.002 ms/op
c.a.p.SO28164665.sortAL 1000 avgt 10 3.381 ± 0.189 ms/op
c.a.p.SO28164665.sortAL 5000 avgt 10 118.882 ± 22.030 ms/op
c.a.p.SO28164665.sortAL 10000 avgt 10 511.668 ± 171.453 ms/op
c.a.p.SO28164665.sortLL 100 avgt 10 0.082 ± 0.002 ms/op
c.a.p.SO28164665.sortLL 1000 avgt 10 13.045 ± 0.460 ms/op
c.a.p.SO28164665.sortLL 5000 avgt 10 642.593 ± 188.044 ms/op
c.a.p.SO28164665.sortLL 10000 avgt 10 1182.698 ± 159.468 ms/op
c.a.p.SO28164665.binarySearchTL 100 avgt 10 0.056 ± 0.002 ms/op
c.a.p.SO28164665.binarySearchTL 1000 avgt 10 1.083 ± 0.052 ms/op
c.a.p.SO28164665.binarySearchTL 5000 avgt 10 8.246 ± 0.329 ms/op
c.a.p.SO28164665.binarySearchTL 10000 avgt 10 735.192 ± 56.071 ms/op
c.a.p.SO28164665.treeMultiSet 100 avgt 10 0.021 ± 0.001 ms/op
c.a.p.SO28164665.treeMultiSet 1000 avgt 10 0.288 ± 0.008 ms/op
c.a.p.SO28164665.treeMultiSet 5000 avgt 10 1.809 ± 0.061 ms/op
c.a.p.SO28164665.treeMultiSet 10000 avgt 10 4.283 ± 0.214 ms/op
Run Code Online (Sandbox Code Playgroud)
对于100k物品:
c.a.p.SO28164665.binarySearchAL 100000 avgt 6 890.585 ± 68.730 ms/op
c.a.p.SO28164665.treeMultiSet 100000 avgt 6 105.273 ± 9.309 ms/op
Run Code Online (Sandbox Code Playgroud)
由于java没有内置的multiset,这是适合您情况的完美数据结构,我建议使用guava库中的TreeMultiset.
Multisets允许重复元素,树multiset还将添加保持集合排序的好处.