将元素从Stream添加到现有List的更好方法是什么？

Question

将元素从Stream添加到现有List的更好方法是什么？

ahe*_*lix 2 java collections java-8 java-stream

我必须编写一些代码,将Java 8 Stream的内容多次添加到List中,而且我很难弄清楚最好的方法是什么.基于我在SO上阅读的内容(主要是这个问题:如何将Java8流的元素添加到现有List中)和其他地方,我将其缩小到以下选项:

import java.util.ArrayList;
import java.util.List;
import java.util.function.Function;
import java.util.stream.Collectors;

public class Accumulator<S, T> {


    private final Function<S, T> transformation;
    private final List<T> internalList = new ArrayList<T>();

    public Accumulator(Function<S, T> transformation) {
        this.transformation = transformation;
    }

    public void option1(List<S> newBatch) {
        internalList.addAll(newBatch.stream().map(transformation).collect(Collectors.toList()));
    }

    public void option2(List<S> newBatch) {
        newBatch.stream().map(transformation).forEach(internalList::add);
    }
}

Run Code Online (Sandbox Code Playgroud)

我们的想法是,对于同一个实例,将多次调用这些方法Accumulator.选择是在使用中间列表和Collection.addAll()在流外部调用一次还是collection.add()从流中为每个元素调用之间.

我倾向于更喜欢选项2,这更符合函数式编程的精神,并且避免创建中间列表,但是,当n很大时调用addAll()而不是调用add()n次可能有好处.

两种选择中的一种明显优于另一种吗？

编辑:JB Nizet有一个非常酷的答案,延迟转换,直到所有批次都添加.在我的情况下,需要直接执行转换.

PS:在我的示例代码中,我用作transformation占位符,用于需要在流上执行的任何操作

Answer 1

JB *_*zet 6

最好的解决方案是第三个,完全避开内部列表.让流为您创建最终列表:

假设您有一个List<List<S>>包含N批次的,必须在其上应用相同的转换,您可以这样做

List<T> result = 
    batches.stream()
           .flatMap(batch -> batch.stream())
           .map(transformation)
           .collect(Collectors.toList());

Run Code Online (Sandbox Code Playgroud)

@rana_stack您认为`Collectors.toList`会创建一个预定列表吗？它远没有这样做.看一下[code](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/stream/Collectors.java#Collectors.toList %28%29)或Holger的回答描述了实际发生的事情. (3认同)
很好的答案,但它要求您可以推迟早期批次的处理,直到所有批次都准备好.情况可能并非总是如此. (2认同)

Answer 2

Hol*_*ger 5

首先，您的第二个变体应该是：

public void option2(List<S> newBatch) {
  newBatch.stream().map(transformation).forEachOrdered(internalList::add);
}

Run Code Online (Sandbox Code Playgroud)

是正确的。

除此之外，addAll在

public void option1(List<S> newBatch) {
  internalList.addAll(newBatch.stream().map(transformation).collect(Collectors.toList()));
}

Run Code Online (Sandbox Code Playgroud)

没有实际意义，因为CollectorAPI 不允许 Stream 向收集器提供有关预期大小的提示，并且需要 Stream 评估每个元素的累加器函数，这只是ArrayList::add在当前实现中。

所以，在此之前的做法可以从中获取任何利益addAll，它填补了一个ArrayList通过反复调用add上ArrayList，包括潜在的容量增加操作。所以你可以option2毫无遗憾地留下来。

另一种方法是使用流构建器进行临时集合：

public class Accumulator<S, T> {
    private final Function<S, T> transformation;
    private final Stream.Builder<T> internal = Stream.builder();

    public Accumulator(Function<S, T> transformation) {
        this.transformation = transformation;
    }

    public void addBatch(List<S> newBatch) {
        newBatch.stream().map(transformation).forEachOrdered(internal);
    }

    public List<T> finish() {
        return internal.build().collect(Collectors.toList());
    }
}

Run Code Online (Sandbox Code Playgroud)

流构建器使用旋转缓冲区，它在增加容量时不需要复制内容，但解决方案仍然存在以下事实：最终收集步骤涉及填充一个ArrayList没有适当初始容量（在当前实现中）的事实。

使用当前的实现，实现完成步骤的效率要高得多

public List<T> finish() {
    return Arrays.asList(internal.build().toArray(…));
}

Run Code Online (Sandbox Code Playgroud)

但这需要IntFunction<T[]>调用者提供的一个（因为我们不能为泛型数组类型这样做），或者执行一个未经检查的操作（假装Object[]是T[]，这在这里可以，但仍然是一个讨厌的未经检查的操作） .

归档时间：	9 年，1 月前
查看次数：	9281 次
最近记录：	9 年，1 月前