Wil*_*zyr 29 java collections performance list
我有很大的List命名项(> = 1,000,000项)和一些由<cond>表示的条件,它选择要删除的项目,<cond>对于我列表中的许多(可能是一半)项目都是正确的.
我的目标是有效地删除<cond>选择的项目并保留所有其他项目,可以修改源列表,可以创建新列表 - 应该考虑性能来选择最佳方法.
这是我的测试代码:
System.out.println("preparing items");
List<Integer> items = new ArrayList<Integer>(); // Integer is for demo
for (int i = 0; i < 1000000; i++) {
items.add(i * 3); // just for demo
}
System.out.println("deleting items");
long startMillis = System.currentTimeMillis();
items = removeMany(items);
long endMillis = System.currentTimeMillis();
System.out.println("after remove: items.size=" + items.size() +
" and it took " + (endMillis - startMillis) + " milli(s)");
Run Code Online (Sandbox Code Playgroud)
和天真的实施:
public static <T> List<T> removeMany(List<T> items) {
int i = 0;
Iterator<T> iter = items.iterator();
while (iter.hasNext()) {
T item = iter.next();
// <cond> goes here
if (/*<cond>: */i % 2 == 0) {
iter.remove();
}
i++;
}
return items;
}
Run Code Online (Sandbox Code Playgroud)
如您所见,我使用项目索引模2 == 0作为删除条件(<cond>) - 仅用于演示目的.
removeMany可以提供哪些更好的版本以及为什么这个更好的版本实际上更好?
Wil*_*zyr 38
好的,现在是提出方法测试结果的时候了.这里我测试了哪些方法(每个方法的名称在我的源代码中也是类名):
NaiveRemoveManyPerformer- ArrayList使用迭代器和删除 - 在我的问题中给出的第一个和天真的实现.BetterNaiveRemoveManyPerformer- ArrayList使用向后迭代并从头到尾移除.LinkedRemoveManyPerformer- 天真的迭代器和删除但工作LinkedList.Disadventage:仅适用于LinkedList.CreateNewRemoveManyPerformer- ArrayList作为副本(仅添加保留元素),迭代器用于遍历输入ArrayList.SmartCreateNewRemoveManyPerformer- 更好CreateNewRemoveManyPerformer- 结果的初始大小(容量)ArrayList设置为最终列表大小.缺点:启动时必须知道列表的最终大小.FasterSmartCreateNewRemoveManyPerformer- 甚至更好(?)SmartCreateNewRemoveManyPerformer- 使用item index(items.get(idx))而不是迭代器.MagicRemoveManyPerformer- ArrayList从列表末尾的项目开始到位(没有列表副本)并压缩孔(已删除的项目).Disadventage:此方法更改列表中项目的顺序.ForwardInPlaceRemoveManyPerformer- 适当的工作ArrayList- 移动保留项目以填充孔,最后返回subList(无最终删除或清除).GuavaArrayListRemoveManyPerformer- 谷歌番石榴Iterables.removeIf用于ArrayList- 几乎相同ForwardInPlaceRemoveManyPerformer但最终删除列表末尾的项目.完整的源代码在本答案的最后给出.
测试使用不同列表大小(从10,000个项目到10,000,000个项目)执行的测试和不同的删除因子(指定必须从列表中删除多少项目).
正如我在其他答案的评论中发布的那样 - 我认为将项目复制ArrayList到第二个ArrayList将比迭代更快LinkedList,只是删除项目.Sun的Java文档说,ArrayList与LinkedList实现相比,常数因子很低,但令人惊讶的是,在我的问题中并非如此.
在实践中LinkedList,简单的迭代和删除在大多数情况下具有最佳性能(此方法在其中实现LinkedRemoveManyPerformer).通常只有MagicRemoveManyPerformer性能可比LinkedRemoveManyPerformer,其他方法明显较慢.Google Guava GuavaArrayListRemoveManyPerformer比手工制作的类似代码慢(因为我的代码不会删除列表末尾不必要的项目).
从1,000,000个源项目中删除500,000个项目的示例结果:
NaiveRemoveManyPerformer:测试没有进行 - 我不是那么耐心,但是表现比...差BetterNaiveRemoveManyPerformer.BetterNaiveRemoveManyPerformer:226080毫(s)LinkedRemoveManyPerformer:69毫(s)CreateNewRemoveManyPerformer:246毫(s)SmartCreateNewRemoveManyPerformer:112毫(s)FasterSmartCreateNewRemoveManyPerformer:202毫(s)MagicRemoveManyPerformer:74毫(s)ForwardInPlaceRemoveManyPerformer:69毫(s)GuavaArrayListRemoveManyPerformer:118毫(s)从1,000,000个源项目中删除1个项目的示例结果(第一个项目已删除):
从1,000,000个源项目中删除333,334个项目的示例结果:
从1,000,000个源项目中删除1,000,000(全部)项目的示例结果(所有项目都被删除但是逐个处理,如果您事先知道要删除所有项目,则应该简单地清除列表):
我的最终结论是:使用混合方法 - 如果处理LinkedList - 简单的迭代和删除是最好的,如果处理ArrayList - 它取决于项目顺序是否重要 - 然后使用ForwardInPlaceRemoveManyPerformer,如果项目顺序可能被更改 - 最佳选择是MagicRemoveManyPerformer.如果删除因子是先验已知的(您知道将删除多少项目与保留),那么可以选择更多条件来选择在特定情况下表现更好的方法.但已知的删除因素不是常见的情况......谷歌番石榴Iterables.removeIf是这样一个混合解决方案,但假设略有不同(原始列表必须更改,新的不能创建,项目顺序总是重要) - 这些是最常见的假设,所以removeIf最好大多数现实生活中的选择.
还要注意所有好方法(天真不好!)都足够好 - 其中任何一个方法在实际应用中做得很好,但必须避免使用天真的方法.
最后 - 我的测试源代码.
package WildWezyrListRemovalTesting;
import com.google.common.base.Predicate;
import com.google.common.collect.Iterables;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
public class RemoveManyFromList {
public static abstract class BaseRemoveManyPerformer {
protected String performerName() {
return getClass().getSimpleName();
}
protected void info(String msg) {
System.out.println(performerName() + ": " + msg);
}
protected void populateList(List<Integer> items, int itemCnt) {
for (int i = 0; i < itemCnt; i++) {
items.add(i);
}
}
protected boolean mustRemoveItem(Integer itemVal, int itemIdx, int removeFactor) {
if (removeFactor == 0) {
return false;
}
return itemIdx % removeFactor == 0;
}
protected abstract List<Integer> removeItems(List<Integer> items, int removeFactor);
protected abstract List<Integer> createInitialList();
public void testMe(int itemCnt, int removeFactor) {
List<Integer> items = createInitialList();
populateList(items, itemCnt);
long startMillis = System.currentTimeMillis();
items = removeItems(items, removeFactor);
long endMillis = System.currentTimeMillis();
int chksum = 0;
for (Integer item : items) {
chksum += item;
}
info("removing took " + (endMillis - startMillis)
+ " milli(s), itemCnt=" + itemCnt
+ ", removed items: " + (itemCnt - items.size())
+ ", remaining items: " + items.size()
+ ", checksum: " + chksum);
}
}
private List<BaseRemoveManyPerformer> rmps =
new ArrayList<BaseRemoveManyPerformer>();
public void addPerformer(BaseRemoveManyPerformer rmp) {
rmps.add(rmp);
}
private Runtime runtime = Runtime.getRuntime();
private void runGc() {
for (int i = 0; i < 5; i++) {
runtime.gc();
}
}
public void testAll(int itemCnt, int removeFactor) {
runGc();
for (BaseRemoveManyPerformer rmp : rmps) {
rmp.testMe(itemCnt, removeFactor);
}
runGc();
System.out.println("\n--------------------------\n");
}
public static class NaiveRemoveManyPerformer
extends BaseRemoveManyPerformer {
@Override
public List<Integer> removeItems(List<Integer> items, int removeFactor) {
if (items.size() > 300000 && items instanceof ArrayList) {
info("this removeItems is too slow, returning without processing");
return items;
}
int i = 0;
Iterator<Integer> iter = items.iterator();
while (iter.hasNext()) {
Integer item = iter.next();
if (mustRemoveItem(item, i, removeFactor)) {
iter.remove();
}
i++;
}
return items;
}
@Override
public List<Integer> createInitialList() {
return new ArrayList<Integer>();
}
}
public static class BetterNaiveRemoveManyPerformer
extends NaiveRemoveManyPerformer {
@Override
public List<Integer> removeItems(List<Integer> items, int removeFactor) {
// if (items.size() > 300000 && items instanceof ArrayList) {
// info("this removeItems is too slow, returning without processing");
// return items;
// }
for (int i = items.size(); --i >= 0;) {
Integer item = items.get(i);
if (mustRemoveItem(item, i, removeFactor)) {
items.remove(i);
}
}
return items;
}
}
public static class LinkedRemoveManyPerformer
extends NaiveRemoveManyPerformer {
@Override
public List<Integer> createInitialList() {
return new LinkedList<Integer>();
}
}
public static class CreateNewRemoveManyPerformer
extends NaiveRemoveManyPerformer {
@Override
public List<Integer> removeItems(List<Integer> items, int removeFactor) {
List<Integer> res = createResultList(items, removeFactor);
int i = 0;
for (Integer item : items) {
if (mustRemoveItem(item, i, removeFactor)) {
// no-op
} else {
res.add(item);
}
i++;
}
return res;
}
protected List<Integer> createResultList(List<Integer> items, int removeFactor) {
return new ArrayList<Integer>();
}
}
public static class SmartCreateNewRemoveManyPerformer
extends CreateNewRemoveManyPerformer {
@Override
protected List<Integer> createResultList(List<Integer> items, int removeFactor) {
int newCapacity = removeFactor == 0 ? items.size()
: (int) (items.size() * (removeFactor - 1L) / removeFactor + 1);
//System.out.println("newCapacity=" + newCapacity);
return new ArrayList<Integer>(newCapacity);
}
}
public static class FasterSmartCreateNewRemoveManyPerformer
extends SmartCreateNewRemoveManyPerformer {
@Override
public List<Integer> removeItems(List<Integer> items, int removeFactor) {
List<Integer> res = createResultList(items, removeFactor);
for (int i = 0; i < items.size(); i++) {
Integer item = items.get(i);
if (mustRemoveItem(item, i, removeFactor)) {
// no-op
} else {
res.add(item);
}
}
return res;
}
}
public static class ForwardInPlaceRemoveManyPerformer
extends NaiveRemoveManyPerformer {
@Override
public List<Integer> removeItems(List<Integer> items, int removeFactor) {
int j = 0; // destination idx
for (int i = 0; i < items.size(); i++) {
Integer item = items.get(i);
if (mustRemoveItem(item, i, removeFactor)) {
// no-op
} else {
if (j < i) {
items.set(j, item);
}
j++;
}
}
return items.subList(0, j);
}
}
public static class MagicRemoveManyPerformer
extends NaiveRemoveManyPerformer {
@Override
public List<Integer> removeItems(List<Integer> items, int removeFactor) {
for (int i = 0; i < items.size(); i++) {
if (mustRemoveItem(items.get(i), i, removeFactor)) {
Integer retainedItem = removeSomeFromEnd(items, removeFactor, i);
if (retainedItem == null) {
items.remove(i);
break;
}
items.set(i, retainedItem);
}
}
return items;
}
private Integer removeSomeFromEnd(List<Integer> items, int removeFactor, int lowerBound) {
for (int i = items.size(); --i > lowerBound;) {
Integer item = items.get(i);
items.remove(i);
if (!mustRemoveItem(item, i, removeFactor)) {
return item;
}
}
return null;
}
}
public static class GuavaArrayListRemoveManyPerformer
extends BaseRemoveManyPerformer {
@Override
protected List<Integer> removeItems(List<Integer> items, final int removeFactor) {
Iterables.removeIf(items, new Predicate<Integer>() {
public boolean apply(Integer input) {
return mustRemoveItem(input, input, removeFactor);
}
});
return items;
}
@Override
protected List<Integer> createInitialList() {
return new ArrayList<Integer>();
}
}
public void testForOneItemCnt(int itemCnt) {
testAll(itemCnt, 0);
testAll(itemCnt, itemCnt);
testAll(itemCnt, itemCnt - 1);
testAll(itemCnt, 3);
testAll(itemCnt, 2);
testAll(itemCnt, 1);
}
public static void main(String[] args) {
RemoveManyFromList t = new RemoveManyFromList();
t.addPerformer(new NaiveRemoveManyPerformer());
t.addPerformer(new BetterNaiveRemoveManyPerformer());
t.addPerformer(new LinkedRemoveManyPerformer());
t.addPerformer(new CreateNewRemoveManyPerformer());
t.addPerformer(new SmartCreateNewRemoveManyPerformer());
t.addPerformer(new FasterSmartCreateNewRemoveManyPerformer());
t.addPerformer(new MagicRemoveManyPerformer());
t.addPerformer(new ForwardInPlaceRemoveManyPerformer());
t.addPerformer(new GuavaArrayListRemoveManyPerformer());
t.testForOneItemCnt(1000);
t.testForOneItemCnt(10000);
t.testForOneItemCnt(100000);
t.testForOneItemCnt(200000);
t.testForOneItemCnt(300000);
t.testForOneItemCnt(500000);
t.testForOneItemCnt(1000000);
t.testForOneItemCnt(10000000);
}
}
Run Code Online (Sandbox Code Playgroud)
Kev*_*ion 11
正如其他人所说,你的第一个倾向应该是建立第二个清单.
但是,如果您还想尝试就地编辑列表,那么有效的方法是使用Iterables.removeIf()Guava.如果它的参数是一个列表,它会将保留的元素合并到前面,然后简单地切掉末尾 - 比逐个删除()内部元素要快得多.
从a中删除大量元素ArrayList是一种O(n^2)操作.我建议只使用LinkedList更优化的插入和删除(但不是随机访问).LinkedList有一点内存开销.
如果你确实需要保留ArrayList,那么最好创建一个新列表.
更新:与创建新列表相比:
重用相同的列表,主要成本来自删除节点和更新LinkedList中的相应指针.这是任何节点的常量操作.
构建新列表时,主要成本来自创建列表和初始化数组条目.两者都是廉价的操作.您可能还要承担调整新列表后端阵列大小的成本; 假设最终数组大于传入数组的一半.
因此,如果您只删除一个元素,那么LinkedList方法可能更快.如果要删除除1之外的所有节点,则新列表方法可能更快.
引入内存管理和GC时会出现更多复杂情况.我想把它们留下来.
最佳选择是自己实施替代方案,并在运行典型负载时对结果进行基准测试.
我会创建一个新List的项目,因为从列表中间删除项目是非常昂贵的.
public static List<T> removeMany(List<T> items) {
List<T> tempList = new ArrayList<T>(items.size()/2); //if about half the elements are going to be removed
Iterator<T> iter = items.iterator();
while (item : items) {
// <cond> goes here
if (/*<cond>: */i % 2 != 0) {
tempList.add(item);
}
}
return tempList;
}
Run Code Online (Sandbox Code Playgroud)
编辑:我没有测试过这个,所以可能会有很小的语法错误.
第二次编辑:LinkedList当您不需要随机访问但快速添加时,使用a 会更好.
但...
常数因子ArrayList小于LinkedList(Ref)的常数因子.既然你可以合理地猜测将删除多少元素(在你的问题中你说"大约一半" ArrayList),只要你不必重新添加一个元素到一个结尾就是O(1)分配它.所以,如果你能做出合理的猜测,我希望它ArrayList比LinkedList大多数情况下要快一些.(这适用于我发布的代码.在你天真的实现中,我认为LinkedList会更快).