我已经看过并尝试过如何在流中对某些内容求和的不同实现.这是我的代码:
List<Person> persons = new ArrayList<Person>();
for(int i=0; i < 10000000; i++){
persons.add(new Person("random", 26));
}
Long start = System.currentTimeMillis();
int test = persons.stream().collect(Collectors.summingInt(p -> p.getAge()));
Long end = System.currentTimeMillis();
System.out.println("Sum of ages = " + test + " and it took : " + (end - start) + " ms with collectors");
Long start3 = System.currentTimeMillis();
int test3 = persons.parallelStream().collect(Collectors.summingInt(p -> p.getAge()));
Long end3 = System.currentTimeMillis();
System.out.println("Sum of ages = " + test3 + " and it took : " + (end3 - start3) + " ms with collectors and parallel stream");
Long start2 = System.currentTimeMillis();
int test2 = persons.stream().mapToInt(p -> p.getAge()).sum();
Long end2 = System.currentTimeMillis();
System.out.println("Sum of ages = " + test2 + " and it took : " + (end2 - start2) + " ms with map and sum");
Long start4 = System.currentTimeMillis();
int test4 = persons.parallelStream().mapToInt(p -> p.getAge()).sum();
Long end4 = System.currentTimeMillis();
System.out.println("Sum of ages = " + test4 + " and it took : " + (end4 - start4) + " ms with map and sum and parallel stream");
Run Code Online (Sandbox Code Playgroud)
这给了我以下结果:
Sum of ages = 220000000 and it took : 110 ms with collectors
Sum of ages = 220000000 and it took : 272 ms with collectors and parallel stream
Sum of ages = 220000000 and it took : 137 ms with map and sum
Sum of ages = 220000000 and it took : 134 ms with map and sum and parallel stream
Run Code Online (Sandbox Code Playgroud)
我尝试了几次并且每次给我不同的结果(大部分时间最后的解决方案是最好的),所以我想知道:
1)正确的方法是什么?
2)为什么?(与其他解决方案有什么区别?)
bla*_*dri 11
在我们进入实际答案之前,您应该了解一些事项:
您的测试结果可能会有很大差异,具体取决于许多因素(例如您运行它的计算机).以下是我的8核机器上运行的结果:
Sum of ages = 260000000 and it took : 94 ms with collectors
Sum of ages = 260000000 and it took : 61 ms with collectors and parallel stream
Sum of ages = 260000000 and it took : 70 ms with map and sum
Sum of ages = 260000000 and it took : 94 ms with map and sum and parallel stream
Run Code Online (Sandbox Code Playgroud)
然后在以后的运行中:
Sum of ages = 260000000 and it took : 68 ms with collectors
Sum of ages = 260000000 and it took : 67 ms with collectors and parallel stream
Sum of ages = 260000000 and it took : 66 ms with map and sum
Sum of ages = 260000000 and it took : 109 ms with map and sum and parallel stream
Run Code Online (Sandbox Code Playgroud)微基准测试不是一个容易的主题.有方法可以做到这一点(稍后我会介绍),但System.currentTimeMillies()
在大多数情况下,尝试使用将无法可靠地运行.
仅仅因为Java 8使并行操作变得容易,这并不意味着它们应该在任何地方使用.并行操作在某些情况下有意义,在其他情况下则不然.
好的,现在让我们来看看你正在使用的各种方法.
顺序收集器:summingInt
您使用的收集器具有以下实现:
public static <T> Collector<T, ?, Integer> summingInt(ToIntFunction<? super T> mapper) {
return new CollectorImpl<>(
() -> new int[1],
(a, t) -> { a[0] += mapper.applyAsInt(t); },
(a, b) -> { a[0] += b[0]; return a; },
a -> a[0], Collections.emptySet());
}
Run Code Online (Sandbox Code Playgroud)
因此,首先将创建一个包含一个元素的新数组.然后在每一个Person
信息流中元素的collect
功能将使用Person#getAge()
函数来检索年龄为Integer
(不是int
!),而且年龄添加到以前的(一维数组中).最后,当处理完整个流时,它将从该数组中提取值并返回它.所以,这里有很多自动装箱和装箱.
ReferencePipeline#forEach(Consumer)
函数来累积从映射函数获得的年龄.再次有很多自动装箱和-unboxing.顺序地图和总和:在这里你将你的地图映射Stream<Person>
到一个IntStream
.这意味着一件事就是不再需要自动装箱或装箱了; 在某些情况下,这可以节省大量时间.然后使用以下实现对结果流求和:
@Override
public final int sum() {
return reduce(0, Integer::sum);
}
Run Code Online (Sandbox Code Playgroud)
reduce
这里的功能会调用ReduceOps#ReduceOp#evaluateSequential(PipelineHelper<T> helper, Spliterator<P_IN> spliterator)
.实际上,这将使用Integer::sum
所有数字上的函数,从0开始,第一个数字,然后是第二个数字的结果,依此类推.
sum()
函数,但是在这种情况下,reduce会调用ReduceOps#ReduceOp#evaluateParallel(PipelineHelper<T> helper, Spliterator<P_IN> spliterator)
而不是顺序选项.这将基本上使用分而治之的方法来累加值.现在,分而治之的巨大优势当然是它可以很容易地并行完成.然而,它确实需要多次拆分和重新连接流,这需要时间.因此它的速度变化很大,取决于它与元素有关的实际任务的复杂性.在添加的情况下,在大多数情况下可能不值得; 正如你从我的结果中看到的那样,它总是一种较慢的方法.现在,为了真正了解所需的时间,让我们做一个适当的微观基准测试.我将使用JMH以下基准代码:
package com.stackoverflow.user2352924;
import org.openjdk.jmh.annotations.*;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MINUTES)
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 10, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
@Fork(1)
@Threads(2)
public class MicroBenchmark {
private static List<Person> persons = new ArrayList<>();
private int test;
static {
for(int i=0; i < 10000000; i++){
persons.add(new Person("random", 26));
}
}
@Benchmark
public void sequentialCollectors() {
test = 0;
test += persons.stream().collect(Collectors.summingInt(p -> p.getAge()));
}
@Benchmark
public void parallelCollectors() {
test = 0;
test += persons.parallelStream().collect(Collectors.summingInt(p -> p.getAge()));
}
@Benchmark
public void sequentialMapSum() {
test = 0;
test += persons.stream().mapToInt(p -> p.getAge()).sum();
}
@Benchmark
public void parallelMapSum() {
test = 0;
test += persons.parallelStream().mapToInt(p -> p.getAge()).sum();
}
}
Run Code Online (Sandbox Code Playgroud)
在pom.xml
这个Maven项目如下:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.stackoverflow.user2352924</groupId>
<artifactId>StackOverflow</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>Auto-generated JMH benchmark</name>
<prerequisites>
<maven>3.0</maven>
</prerequisites>
<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>0.9.5</jmh.version>
<javac.target>1.8</javac.target>
<uberjar.name>benchmarks</uberjar.name>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<compilerVersion>${javac.target}</compilerVersion>
<source>${javac.target}</source>
<target>${javac.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>microbenchmarks</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>2.5</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.1</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.1</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
</plugin>
<plugin>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.9.1</version>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>2.6</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.3</version>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>2.2.1</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.17</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
Run Code Online (Sandbox Code Playgroud)
确保Maven也在运行Java 8,否则你会遇到难看的错误.
我不会在这里详细介绍如何使用JMH(还有其他地方可以这样做),但这是我得到的结果:
# Run complete. Total time: 00:08:48
Benchmark Mode Samples Score Score error Units
c.s.u.MicroBenchmark.parallelCollectors thrpt 10 3658,949 775,115 ops/min
c.s.u.MicroBenchmark.parallelMapSum thrpt 10 2616,905 221,109 ops/min
c.s.u.MicroBenchmark.sequentialCollectors thrpt 10 5502,160 439,024 ops/min
c.s.u.MicroBenchmark.sequentialMapSum thrpt 10 6120,162 609,232 ops/min
Run Code Online (Sandbox Code Playgroud)
因此,在我运行这些测试的系统上,顺序映射总和相当快,在并行映射总和(使用分而治之的方法)设法仅执行超过2600时,管理超过6100次操作事实上,顺序方法都比并行方法快得多.
现在,在一个可以更容易并行运行的情况下 - 例如,Person#getAge()
函数比只是一个吸气剂复杂得多 - 并行方法可能是一个更好的解决方案.最后,这一切都取决于被测试案例中并行运行的效率.
另一件需要记住的事情是:如果有疑问,请做一个适当的微观基准.;-)
归档时间: |
|
查看次数: |
3352 次 |
最近记录: |