为什么迭代通过List <String>比拆分字符串慢并迭代StringBuilder？

Question

为什么迭代通过List <String>比拆分字符串慢并迭代StringBuilder？

Igo*_*nze 2 java string stringbuilder loops list

我想知道为什么List<String>每个循环比每个循环的分割要慢StringBuilder

这是我的代码:

package nl.testing.startingpoint;

import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.ArrayList;
import java.util.List;

public class Main {

    public static void main(String args[]) {
        NumberFormat formatter = new DecimalFormat("#0.00000");

        List<String> a = new ArrayList<String>();
        StringBuffer b = new StringBuffer();        

        for (int i = 0;i <= 10000; i++)
        {
            a.add("String:" + i);
            b.append("String:" + i + " ");
        }

        long startTime = System.currentTimeMillis();
        for (String aInA : a) 
        {
            System.out.println(aInA);
        }
        long endTime   = System.currentTimeMillis();

        long startTimeB = System.currentTimeMillis();
        for (String part : b.toString().split(" ")) {

            System.out.println(part);
        }
        long endTimeB   = System.currentTimeMillis();

        System.out.println("Execution time from StringBuilder is " + formatter.format((endTimeB - startTimeB) / 1000d) + " seconds");
        System.out.println("Execution time List is " + formatter.format((endTime - startTime) / 1000d) + " seconds");

    }
}

Run Code Online (Sandbox Code Playgroud)

结果是:

StringBuilder的执行时间为0,03300秒
执行时间列表为0,06000秒

我希望StringBuilder因为它而变慢b.toString().split(" ")).

任何人都可以向我解释这个吗？

Answer 1

T.J*_*der 5

(这是一个完全修改过的答案.请参阅¹了解原因.感谢Buhb让我再看一眼!注意他/她也发布了答案.)

小心你的结果,Java中的微基准测试非常棘手,你的基准测试代码正在做I/O等等; 看到这个问题及其答案:我如何用Java编写正确的微基准测试？

事实上,据我所知,你的结果误导了你(最初是我).虽然增强的for一个环路String阵列的速度远远超过其上ArrayList<String>(低于更多),该.toString().split(" ")开销似乎仍然占据主导地位,使该版本比慢ArrayList版本.显着慢了.

让我们使用经过精心设计和测试的微基准测试工具来确定哪个更快:JMH.

我正在使用Linux,所以这里是我如何设置它($只是为了表示命令提示符;你输入的是之后):

1.首先,我安装了Maven,因为我通常没有安装它:

$ sudo apt-get install maven

然后我使用Maven创建了一个示例基准项目:

$ mvn archetype:generate \
          -DinteractiveMode=false \
          -DarchetypeGroupId=org.openjdk.jmh \
          -DarchetypeArtifactId=jmh-java-benchmark-archetype \
          -DgroupId=org.sample \
          -DartifactId=test \
          -Dversion=1.0

这会在test子目录中创建基准项目,因此:

$ cd test

3.在生成的项目中,我删除了默认值src/main/java/org/sample/MyBenchmark.java并在该文件夹中创建了三个文件以进行基准测试:

Common.java:真无聊:

package org.sample;

public class Common {
    public static final int LENGTH = 10001;
}

Run Code Online (Sandbox Code Playgroud)

最初我预计需要更多......

TestList.java:

package org.sample;

import java.util.List;
import java.util.ArrayList;
import java.text.NumberFormat;
import java.text.DecimalFormat;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Scope;

public class TestList {

    // This state class lets us set up our list once and reuse it for tests in this test thread
    @State(Scope.Thread)
    public static class TestState {
        public final List<String> list;

        public TestState() {
            // Your code for creating the list
            NumberFormat formatter = new DecimalFormat("#0.00000");
            List<String> a = new ArrayList<String>();
            for (int i = 0; i < Common.LENGTH; ++i)
            {
                a.add("String:" + i);
            }
            this.list = a;
        }
    }

    // This is the test method JHM will run for us
    @Benchmark
    public void test(TestState state) {
        // Grab the list
        final List<String> strings = state.list;

        // Loop through it -- note that I'm doing work within the loop, but not I/O since
        // we don't want to measure I/O, we want to measure loop performance
        int l = 0;
        for (String s : strings) {
            l += s == null ? 0 : 1;
        }

        // I always do things like this to ensure that the test is doing what I expected
        // it to do, and so that I actually use the result of the work from the loop
        if (l != Common.LENGTH) {
            throw new RuntimeException("Test error");
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

TestStringSplit.java:

package org.sample;

import java.text.NumberFormat;
import java.text.DecimalFormat;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Scope;

@State(Scope.Thread)
public class TestStringSplit {

    // This state class lets us set up our list once and reuse it for tests in this test thread
    @State(Scope.Thread)
    public static class TestState {
        public final StringBuffer sb;

        public TestState() {
            NumberFormat formatter = new DecimalFormat("#0.00000");

            StringBuffer b = new StringBuffer();        

            for (int i = 0; i < Common.LENGTH; ++i)
            {
                b.append("String:" + i + " ");
            }

            this.sb = b;
        }
    }

    // This is the test method JHM will run for us
    @Benchmark
    public void test(TestState state) {
        // Grab the StringBuffer, convert to string, split it into an array
        final String[] strings = state.sb.toString().split(" ");

        // Loop through it -- note that I'm doing work within the loop, but not I/O since
        // we don't want to measure I/O, we want to measure loop performance
        int l = 0;
        for (String s : strings) {
            l += s == null ? 0 : 1;
        }

        // I always do things like this to ensure that the test is doing what I expected
        // it to do, and so that I actually use the result of the work from the loop
        if (l != Common.LENGTH) {
            throw new RuntimeException("Test error");
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

现在我们有了测试,我们建立了项目:

$ mvn clean install

我们准备测试了!关闭所有不需要运行的程序,然后关闭此命令.这需要一段时间,您希望在此过程中让您的机器独立.去抓一杯o'Java.

$ java -jar target/benchmarks.jar -f 4 -wi 10 -i 10

(注意:-f 4意思是"只做四个叉子,而不是十个"; -wi 10意思是"只进行10次预热,而不是20次;"并且-i 10意味着"只做10次测试迭代,而不是20次".如果你想要非常严谨,请离开他们休息,去吃午饭而不是喝咖啡休息.)

这是我在64位Intel机器上使用JDK 1.8.0_74得到的结果:

Benchmark              Mode  Cnt      Score      Error  Units
TestList.test         thrpt   40  65641.040 ± 3811.665  ops/s
TestStringSplit.test  thrpt   40   4909.565 ±   33.822  ops/s

循环列表版本每秒执行超过65k次操作,而分离并循环数组版本则少于5000次操作/秒.

因此,您最初的期望是,List由于执行此操作的成本,版本会更快.toString().split(" ").这样做并循环结果明显慢于使用List.

关于加强for对String[]主场迎战List<String>:这是明显快于遍历String[]不是通过List<String>,所以.toString().split(" ")一定花了我们很多.为了测试循环部分,我在TestList前面的类中使用了JMH ,这个TestArray类:

package org.sample;

import java.util.List;
import java.util.ArrayList;
import java.text.NumberFormat;
import java.text.DecimalFormat;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Scope;

public class TestArray {

    // This state class lets us set up our list once and reuse it for tests in this test thread
    @State(Scope.Thread)
    public static class TestState {
        public final String[] array;

        public TestState() {
            // Create an array with strings like the ones in the list
            NumberFormat formatter = new DecimalFormat("#0.00000");
            String[] a = new String[Common.LENGTH];
            for (int i = 0; i < Common.LENGTH; ++i)
            {
                a[i] = "String:" + i;
            }
            this.array = a;
        }
    }

    // This is the test method JHM will run for us
    @Benchmark
    public void test(TestState state) {
        // Grab the list
        final String[] strings = state.array;

        // Loop through it -- note that I'm doing work within the loop, but not I/O since
        // we don't want to measure I/O, we want to measure loop performance
        int l = 0;
        for (String s : strings) {
            l += s == null ? 0 : 1;
        }

        // I always do things like this to ensure that the test is doing what I expected
        // it to do, and so that I actually use the result of the work from the loop
        if (l != Common.LENGTH) {
            throw new RuntimeException("Test error");
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

我像之前的测试一样运行它(四个叉子,10次热身和10次迭代); 结果如下:

Benchmark        Mode  Cnt       Score      Error  Units
TestArray.test  thrpt   40  568328.087 ±  580.946  ops/s
TestList.test   thrpt   40   62069.305 ± 3793.680  ops/s

比列表更多的操作/秒循环通过阵列几乎一个数量级.

这并不让我感到惊讶,因为增强的for循环可以直接在数组上工作,但必须使用在case中Iterator返回并对其进行方法调用:每个循环两次调用(和)10,001循环= 20,002次调用.方法调用很便宜,但它们不是免费的,即使JIT内联它们,这些调用的代码仍然必须运行.的有做了一些工作,才可以返回一个数组项,而当增强循环知道它在处理一个数组,它可以直接在它的工作.ListListIterator#hasNextIterator#nextArrayListListIteratorfor

上面的测试类包含了测试内容,但是为了了解数组版本更快的原因,让我们来看看这个更简单的程序:

import java.util.List;
import java.util.ArrayList;

public class Example {
    public static final void main(String[] args) throws Exception {
        String[] array = new String[10];
        List<String> list = new ArrayList<String>(array.length);
        for (int n = 0; n < array.length; ++n) {
            array[n] = "foo" + System.currentTimeMillis();
            list.add(array[n]);
        }

        useArray(array);
        useList(list);

        System.out.println("Done");
    }

    public static void useArray(String[] array) {
        System.out.println("Using array:");
        for (String s : array) {
            System.out.println(s);
        }
    }

    public static void useList(List<String> list) {
        System.out.println("Using list:");
        for (String s : list) {
            System.out.println(s);
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

javap -c Example在编译之后使用,我们可以查看两个useXYZ函数的字节码; 我用粗体表示每个部分的循环部分,并将它们与每个函数的其余部分略微设置:

useArray:

  public static void useArray(java.lang.String[]);
    Code:
       0: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #18                 // String Using array:
       5: invokevirtual #17                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: aload_0
       9: astore_1
      10: aload_1
      11: arraylength
      12: istore_2
      13: iconst_0
      14: istore_3

      15: iload_3
      16: iload_2
      17: if_icmpge     39
      20: aload_1
      21: iload_3
      22: aaload
      23: astore        4
      25: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
      28: aload         4
      30: invokevirtual #17                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      33: iinc          3, 1
      36: goto          15

      39: return

useList:

  public static void useList(java.util.List);
    Code:
       0: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #19                 // String Using list:
       5: invokevirtual #17                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: aload_0
       9: invokeinterface #20,  1           // InterfaceMethod java/util/List.iterator:()Ljava/util/Iterator;
      14: astore_1

      15: aload_1
      16: invokeinterface #21,  1           // InterfaceMethod java/util/Iterator.hasNext:()Z
      21: ifeq          44
      24: aload_1
      25: invokeinterface #22,  1           // InterfaceMethod java/util/Iterator.next:()Ljava/lang/Object;
      30: checkcast     #2                  // class java/lang/String
      33: astore_2
      34: getstatic     #15                 // Field java/lang/System.out:Ljava/io/PrintStream;
      37: aload_2
      38: invokevirtual #17                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      41: goto          15

      44: return

所以我们可以看到useArray直接在数组上运行,我们可以看到useList对Iterator方法的两次调用.

当然,大部分时间都没关系.除非您确定要优化的代码成为瓶颈,否则不要担心这些事情.

¹这个答案已经从其原始版本进行了彻底的修改,因为我在原始版本中假设分裂然后循环阵列版本更快的断言是真的.我完全没有检查那个断言,只是跳进了分析增强for循环在数组上比列表更快的分析.我的错.非常感谢Buhb让我仔细看看.

归档时间：	9 年，7 月前
查看次数：	152 次
最近记录：	7 年，2 月前