在列表中查找唯一值的快速方法

Jam*_*sev 3 java collections performance

给定一个KeyValuePairs列表,其中每对都有一个getValue()方法,获取List(或Set)唯一值的最快方法是什么?

以下所有产生可接受的结果.u1似乎是最快的超过预期的列表大小(约1000-2000 KVP)

我们能做得更好(更快)吗?

private static Set<String> u1(List<_KVPair> pairs) {
    Set<String> undefined = new HashSet<String>();

    for (_KVPair pair : pairs) {
        undefined.add(pair.getValue());
    }

    if (undefined.size() == 1) {
        return new HashSet<String>();
    }
    return undefined;
}

private static List<String> u2(List<_KVPair> pairs) {

    List<String> undefined = new ArrayList<String>();
    for (_KVPair pair : pairs) {
        if (!undefined.contains(pair.getValue())) {
            undefined.add(pair.getValue());
        }
    }

    return undefined;
}

private static List<String> u3(List<_KVPair> pairs) {

    List<String> undefined = new LinkedList<String>();

    Iterator<_KVPair> it = pairs.iterator();
    while (it.hasNext()) {
        String value = it.next().getValue();
        if (!undefined.contains(value)) {
            undefined.add(value);
        }
    }
    return undefined;
}
Run Code Online (Sandbox Code Playgroud)

大约3600对,'u3'获胜.大约1500对,'u1'获胜

ass*_*ias 7

第一个选项应该更快.您可以通过在使用之前调整集合大小来使其更快.通常,如果您期望少量重复:

Set<String> undefined = new HashSet<String>(pairs.size(), 1);
Run Code Online (Sandbox Code Playgroud)

请注意,我使用1作为加载因子来防止任何大小调整.

出于好奇,我进行了测试(下面的代码) - 结果是(编译后):

测试1(注意:热身需要几分钟)

原始列表的大小= 3,000,没有重复:
set:8
arraylist:668
linkedlist:1166

测试2

原始列表的大小= 30,000 - 所有字符串相同:
set:25
arraylist:11
linkelist:13

那种有道理:

  • 当有很多重复项时,List#contains运行速度会相当快,因为​​重复发现的速度会更快,并且分配大量集合的成本+散列算法也会受到影响
  • 当没有或很少有重复时,该集合大幅度地获胜.
public class TestPerf {

    private static int NUM_RUN;
    private static Random r = new Random(System.currentTimeMillis());
    private static boolean random = false; //toggle to false for no duplicates in original list


    public static void main(String[] args) {

        List<String> list = new ArrayList<>();

        for (int i = 0; i < 30_000; i++) {
            list.add(getRandomString());
        }

        //warm up
        for (int i = 0; i < 10_000; i++) {
            method1(list);
            method2(list);
            method3(list);
        }

        NUM_RUN = 100;
        long sum = 0;
        long start = System.nanoTime();
        for (int i = 0; i < NUM_RUN; i++) {
            sum += method1(list);
        }
        long end = System.nanoTime();
        System.out.println("set: " + (end - start) / 1000000);

        sum = 0;
        start = System.nanoTime();
        for (int i = 0; i < NUM_RUN; i++) {
            sum += method2(list);
        }
        end = System.nanoTime();
        System.out.println("arraylist: " + (end - start) / 1000000);

        sum = 0;
        start = System.nanoTime();
        for (int i = 0; i < NUM_RUN; i++) {
            sum += method3(list);
        }
        end = System.nanoTime();
        System.out.println("linkelist: " + (end - start) / 1000000);

        System.out.println(sum);
    }

    private static int method1(final List<String> list) {
        Set<String> set = new HashSet<>(list.size(), 1);
        for (String s : list) {
            set.add(s);
        }
        return set.size();
    }

    private static int method2(final List<String> list) {
        List<String> undefined = new ArrayList<>();
        for (String s : list) {
            if (!undefined.contains(s)) {
                undefined.add(s);
            }
        }
        return undefined.size();
    }

    private static int method3(final List<String> list) {
        List<String> undefined = new LinkedList<>();

        Iterator<String> it = list.iterator();
        while (it.hasNext()) {
            String value = it.next();
            if (!undefined.contains(value)) {
                undefined.add(value);
            }
        }
        return undefined.size();
    }

    private static String getRandomString() {
        if (!random) {
            return "skdjhflkjrglajhsdkhkjqwhkdjahkshd";
        }
        int size = r.nextInt(100);
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < size; i++) {
            char c = (char) ('a' + r.nextInt(27));
            sb.append(c);
        }
        System.out.println(sb);
        return sb.toString();
    }
}
Run Code Online (Sandbox Code Playgroud)

  • @Jam听起来你可能很难对你的代码进行基准测试.您需要运行方法来测试数十次或数百次,以避免测量任何"预热"成本.请参阅http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java (2认同)