Java Enhanced-For-Loop比传统的更快？

Question

Java Enhanced-For-Loop比传统的更快？

Col*_*lby 2 java performance loops for-loop

所以我的理解是增强的for循环应该更慢,因为它们必须使用Iterator.但是我的代码提供了混合结果..(是的我知道循环逻辑占用了循环中花费的大部分时间)

对于较少的迭代次数(100-1000),使用和不使用JIT时,增强的for循环似乎要快得多.相反,迭代次数很多(100000000),传统的循环要快得多.这里发生了什么？

public class NewMain {

    public static void main(String[] args) {

        System.out.println("Warming up");

        int warmup = 1000000;
        for (int i = 0; i < warmup; i++) {
            runForLoop();
        }
        for (int i = 0; i < warmup; i++) {
            runEnhancedFor();
        }

        System.out.println("Running");
        int iterations = 100000000;
        long start = System.nanoTime();
        for (int i = 0; i < iterations; i++) {
            runForLoop();
        }
        System.out.println((System.nanoTime() - start) / iterations + "nS");

        start = System.nanoTime();
        for (int i = 0; i < iterations; i++) {
            runEnhancedFor();
        }
        System.out.println((System.nanoTime() - start) / iterations + "nS");
    }

    public static final List<Integer> array = new ArrayList(100);

    public static int l;

    public static void runForLoop() {
        for (int i = 0; i < array.size(); i++) {
            l += array.get(i);
        }
    }

    public static void runEnhancedFor() {
        for (int i : array) {
            l += i;
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ale*_*lev 32

基准测试错误.什么是错误的非详尽列表:

没有适当的预热:单次测量几乎总是错误的;
在单个方法中混合使用几个代码路径:我们可能开始使用仅适用于方法中第一个循环的执行数据来编译方法;
来源是可预测的:如果循环编译,我们实际上可以预测结果;
结果是死代码消除:如果循环编译,我们可以抛弃循环

花点时间听这些讲座,并浏览这些样本.

这就是你如何做到这一点可以说是正确的jmh:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 3, time = 1)
@Fork(3)
@State(Scope.Thread)
public class EnhancedFor {

    private static final int SIZE = 100;

    private List<Integer> list;

    @Setup
    public void setup() {
        list = new ArrayList<Integer>(SIZE);
    }


    @GenerateMicroBenchmark
    public int enhanced() {
        int s = 0;
        for (int i : list) {
            s += i;
        }
        return s;
    }

    @GenerateMicroBenchmark
    public int indexed() {
        int s = 0;
        for (int i = 0; i < list.size(); i++) {
            s += list.get(i);
        }
        return s;
    }

    @GenerateMicroBenchmark
    public void enhanced_indi(BlackHole bh) {
        for (int i : list) {
            bh.consume(i);
        }
    }

    @GenerateMicroBenchmark
    public void indexed_indi(BlackHole bh) {
        for (int i = 0; i < list.size(); i++) {
            bh.consume(list.get(i));
        }
    }

}

Run Code Online (Sandbox Code Playgroud)

......产生的东西如下:

Benchmark                         Mode   Samples      Mean   Mean error    Units
o.s.EnhancedFor.enhanced          avgt         9     8.162        0.057    ns/op
o.s.EnhancedFor.enhanced_indi     avgt         9     7.600        0.067    ns/op
o.s.EnhancedFor.indexed           avgt         9     2.226        0.091    ns/op
o.s.EnhancedFor.indexed_indi      avgt         9     2.116        0.064    ns/op

Run Code Online (Sandbox Code Playgroud)

现在增强和索引循环之间的差异很小,并且通过采用不同的代码路径来访问后备存储,可以天真地解释这种差异.然而,解释实际上要简单得多:OP FORGOT要占据列表,这意味着循环体永远不会被执行,而基准实际上是衡量size()vs 的成本iterator()!

修复:

@Setup
public void setup() {
    list = new ArrayList<Integer>(SIZE);
    for (int c = 0; c < SIZE; c++) {
        list.add(c);
    }
}

Run Code Online (Sandbox Code Playgroud)

然后收益率:

Benchmark                         Mode   Samples       Mean   Mean error    Units
o.s.EnhancedFor.enhanced          avgt         9    171.154       25.892    ns/op
o.s.EnhancedFor.enhanced_indi     avgt         9    384.192        6.856    ns/op
o.s.EnhancedFor.indexed           avgt         9    148.679        1.357    ns/op
o.s.EnhancedFor.indexed_indi      avgt         9    465.684        0.860    ns/op

Run Code Online (Sandbox Code Playgroud)

请注意,即使在纳米尺度上,差异也非常微小,如果有的话,非平凡的循环体将消耗差异.这里的差异可以解释为我们在内联get()和Iterator方法方面的幸运程度,以及在这些内联之后我们可以享受的优化.

请注意indi_*测试,否定循环展开优化.即使indexed在成功展开时享受更好的性能,但是当展开被打破时则相反!

有了这样的头条新闻,之间的差异indexed,并enhanced无非是学术兴趣更多.找出-XX:+PrintAssembly所有案例的确切生成代码留给读者运动:)

归档时间：	12 年，7 月前
查看次数：	3936 次
最近记录：	8 年，1 月前