Jam*_*sev 3 java collections performance
给定一个KeyValuePairs列表,其中每对都有一个getValue()方法,获取List(或Set)唯一值的最快方法是什么?
以下所有产生可接受的结果.u1似乎是最快的超过预期的列表大小(约1000-2000 KVP)
我们能做得更好(更快)吗?
private static Set<String> u1(List<_KVPair> pairs) {
Set<String> undefined = new HashSet<String>();
for (_KVPair pair : pairs) {
undefined.add(pair.getValue());
}
if (undefined.size() == 1) {
return new HashSet<String>();
}
return undefined;
}
private static List<String> u2(List<_KVPair> pairs) {
List<String> undefined = new ArrayList<String>();
for (_KVPair pair : pairs) {
if (!undefined.contains(pair.getValue())) {
undefined.add(pair.getValue());
}
}
return undefined;
}
private static List<String> u3(List<_KVPair> pairs) {
List<String> undefined = new LinkedList<String>();
Iterator<_KVPair> it = pairs.iterator();
while (it.hasNext()) {
String value = it.next().getValue();
if (!undefined.contains(value)) {
undefined.add(value);
}
}
return undefined;
}
Run Code Online (Sandbox Code Playgroud)
大约3600对,'u3'获胜.大约1500对,'u1'获胜
第一个选项应该更快.您可以通过在使用之前调整集合大小来使其更快.通常,如果您期望少量重复:
Set<String> undefined = new HashSet<String>(pairs.size(), 1);
Run Code Online (Sandbox Code Playgroud)
请注意,我使用1作为加载因子来防止任何大小调整.
出于好奇,我进行了测试(下面的代码) - 结果是(编译后):
测试1(注意:热身需要几分钟)
原始列表的大小= 3,000,没有重复:
set:8
arraylist:668
linkedlist:1166
测试2
原始列表的大小= 30,000 - 所有字符串相同:
set:25
arraylist:11
linkelist:13
那种有道理:
List#contains运行速度会相当快,因为重复发现的速度会更快,并且分配大量集合的成本+散列算法也会受到影响public class TestPerf {
private static int NUM_RUN;
private static Random r = new Random(System.currentTimeMillis());
private static boolean random = false; //toggle to false for no duplicates in original list
public static void main(String[] args) {
List<String> list = new ArrayList<>();
for (int i = 0; i < 30_000; i++) {
list.add(getRandomString());
}
//warm up
for (int i = 0; i < 10_000; i++) {
method1(list);
method2(list);
method3(list);
}
NUM_RUN = 100;
long sum = 0;
long start = System.nanoTime();
for (int i = 0; i < NUM_RUN; i++) {
sum += method1(list);
}
long end = System.nanoTime();
System.out.println("set: " + (end - start) / 1000000);
sum = 0;
start = System.nanoTime();
for (int i = 0; i < NUM_RUN; i++) {
sum += method2(list);
}
end = System.nanoTime();
System.out.println("arraylist: " + (end - start) / 1000000);
sum = 0;
start = System.nanoTime();
for (int i = 0; i < NUM_RUN; i++) {
sum += method3(list);
}
end = System.nanoTime();
System.out.println("linkelist: " + (end - start) / 1000000);
System.out.println(sum);
}
private static int method1(final List<String> list) {
Set<String> set = new HashSet<>(list.size(), 1);
for (String s : list) {
set.add(s);
}
return set.size();
}
private static int method2(final List<String> list) {
List<String> undefined = new ArrayList<>();
for (String s : list) {
if (!undefined.contains(s)) {
undefined.add(s);
}
}
return undefined.size();
}
private static int method3(final List<String> list) {
List<String> undefined = new LinkedList<>();
Iterator<String> it = list.iterator();
while (it.hasNext()) {
String value = it.next();
if (!undefined.contains(value)) {
undefined.add(value);
}
}
return undefined.size();
}
private static String getRandomString() {
if (!random) {
return "skdjhflkjrglajhsdkhkjqwhkdjahkshd";
}
int size = r.nextInt(100);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < size; i++) {
char c = (char) ('a' + r.nextInt(27));
sb.append(c);
}
System.out.println(sb);
return sb.toString();
}
}
Run Code Online (Sandbox Code Playgroud)