有没有理由不使用Java 8的parallelSort?

Cal*_*rey 52 java sorting parallel-processing

我正在阅读有关Java 和的区别的问题,这已经有几年了。令我惊讶的是,只有一个问题提到了使用的任何缺点; 也就是说,如果您使用大量CPU,则速度会降低。Arrays.sortArrays.parallelSortparallelSort

假设您不在某种专门的单线程环境中,应该总是选择一个parallelSort吗?有没有理由不这样做?请注意,上述问题的答案之一是,如果少于4096个元素,则无论如何都会parallelSort调用sort

roo*_*eee 55

使用有一些缺点 Arrays.parallelSort

  • 它使用ForkJoinPool.commonPool()和,并且会与默认情况下使用它的其他函数(例如parallel(),在流上)抗争
  • 线程池Arrays.parallelSort使用是不可配置的(仅在全局级别上通过增加公共池线程数量)
  • 它在较小的数据集上表现较差(数组通常不包含很少的元素,JDK甚至承认,例如大多数数组ArrayList 在整个生命周期内为空,这可以节省大量的内存和CPU时间,无需实例化永远不会填充的数组)

还有一个有趣的场景:说说您是否实施了一些需要分类的纸牌游戏。它非常容易并行地并行执行多个游戏执行,而不是并行执行一次运行的排序机制,而排序机制可能只占整个游戏循环的一小部分。您失去了立即并行化的简便方法(例如,在遗传算法的背景下运行游戏时)。

但是,是的,如果碰巧有大数组,排序是应用程序运行时使用的重要组成部分Arrays.parallelSort

编辑:即使Arrays.parallelSort给定数组少于4096个元素,即使切换到常规排序:都是为了显示意图-如果可能,您需要一个并行排序,它具有与调用不同的含义sort。而且要挑剔:它确实在小型阵列上表现较差,因为它必须进行额外检查,以检查该阵列是否包含少于4096个元素,以及其他一些有关公共池线程数的检查(当然,其开销可以忽略不计):) 。

  • 您可以使用自定义ForkJoinPool进行流排序,如本期所示:/sf/ask/1481417591/ (2认同)

Eug*_*ene 6

This is not much different than the question when to use stream() vs parallelStream() - it depends on how much data you have. Of course the majority of time, when sorting 10 elements in parallel, will be consumed by the threading framework that is under the hood (which is un-specified by the documentation), not by the sorting itself.

But you also have to wonder why such methods are introduced IMO. Hardware is moving (has already moved?) towards many CPU's, not more GHz, so doing things in parallel is only a normal course for any language that wants to still be alive in the next 20 years.

As to how much data you need to actually be performant for parallelSort as opposed to sort, plus knowing that we need at least MIN_ARRAY_SORT_GRAN + 1 to gain any potential benefit; writing a proper test to prove that for this particular set-up and run, you would need at least X numbers, is not that complicated. You also have to take into account that some arrays might be already sorted (explained further), while some might be totally un-sorted (5,4,3,2,1 for example), this brings some penalties for the second one.

Taking some random data and making a test:

@Warmup(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
public class ParallelSort {

    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
            .include(ParallelSort.class.getName())
            .build();

        new Runner(opt).run();
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(1)
    public int[] parallel(ParallelSortExecutionPlan plan) {
        Arrays.parallelSort(plan.ints());
        return plan.ints();
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @Fork(1)
    public int[] nonParallel(ParallelSortExecutionPlan plan) {
        Arrays.sort(plan.ints());
        return plan.ints();
    }
}


@State(Scope.Benchmark)
public class ParallelSortExecutionPlan {

    @Param(value = {"10", "100", "1000", "10000", "100000", "1000000"})
    private int howMany;

    private int[] ints;

    public static void main(String[] args) {
    }

    @Setup(Level.Invocation)
    public void setUp() {
        ints = new int[howMany];
        for (int i = 0; i < howMany; ++i) {
            ints[i] = ThreadLocalRandom.current().nextInt();
        }
    }

    int[] ints() {
        return ints;
    }
}
Run Code Online (Sandbox Code Playgroud)

Just notice that the second class is using @Setup(Level.Invocation) (if you know a bit of JMH) - this is a very sharp tool here; but I uses it because I want an un-sorted array for each Invocation of the method. As otherwise, if Trial would have been used for example - only the first call would be an un-sorted array, all other calls of the @Benhcmark method would already be sorted. For the fun of it, you could change that single line to @Setup(Level.Trial) for example and see the results, they will make very little sense.

Running this reveals:

Benchmark                 (howMany)  Mode  Cnt         Score   Error  Units

ParallelSort.nonParallel         10  avgt    2       128.847          ns/op
ParallelSort.parallel            10  avgt    2       116.656          ns/op

ParallelSort.nonParallel        100  avgt    2      1956.746          ns/op
ParallelSort.parallel           100  avgt    2      1963.335          ns/op

ParallelSort.nonParallel       1000  avgt    2     32162.611          ns/op
ParallelSort.parallel          1000  avgt    2     31716.915          ns/op

ParallelSort.nonParallel      10000  avgt    2    423531.663          ns/op
ParallelSort.parallel         10000  avgt    2    201802.609          ns/op

ParallelSort.nonParallel     100000  avgt    2   6503511.987          ns/op
ParallelSort.parallel        100000  avgt    2   1363169.661          ns/op

ParallelSort.nonParallel    1000000  avgt    2  69058738.586          ns/op
ParallelSort.parallel       1000000  avgt    2  13469112.930          ns/op
Run Code Online (Sandbox Code Playgroud)

Pretty much a very expected output to me.


duf*_*ymo 4

不,对于足够小的阵列我会说不。设置线程的开销不会导致明显的加速。

关键是“够小”。所有问题的答案都不会相同。

决不应该应用教条,除非是在这条教条规则的情况下。就像我们唯一不应该容忍的就是不容忍。那里有一个波普尔悖论。