在findFirst()之前使用sorted()流不再是懒惰的

Muh*_*edy 11 java java-8 java-stream

我有一个元素列表,我需要找到满足条件的第一个元素然后使用Java 8流退出.

我认为下面的代码很遗憾地评估了所有可用的元素,这不是我需要的,我需要逐个评估项目,并break在找到第一个匹配时停止():

我在这里排序元素,然后将元素映射到它的url属性,然后尝试过滤,如果url不是null或空,然后找到first匹配!

Arrays.stream(dataArray)
.sorted(Comparator.comparing(d -> d.getPriority()))
.peek(o -> System.out.println("SORT: " + o))
.map(d -> d.getOriginalURL(shortUrl))
.peek(o -> System.out.println("MAP: " + o))
.filter(u -> u != null && !u.isEmpty())
.peek(o -> System.out.println("FILTER: " + o))
.findFirst().orElse("");
Run Code Online (Sandbox Code Playgroud)

但输出显示,即使第一个项与ifcondition(filter)操作匹配,所有项也都会被评估.

Data[] data = new Data[] { new ParseData(), new InMemoryData() };
System.out.println(">>> " + getOriginalURL(data, ""));
Run Code Online (Sandbox Code Playgroud)

OUTPUT:

SORT: mhewedy.usingspark.data.InMemoryData@7adf9f5f
MAP: InMemory URL
FILTER: InMemory URL
SORT: mhewedy.usingspark.data.ParseData@85ede7b
MAP: Parse.com URL         <<< THIS SHOULD NOT HAPPEN
FILTER: Parse.com URL      <<< AND THIS TOO
>>> InMemory URL
Run Code Online (Sandbox Code Playgroud)

如输出所示,当过滤器与第一个元素匹配时,流不会停止,而是继续评估第二个元素!

我想这样做:

Arrays.sort(dataArray, Comparator.comparing(d -> d.getPriority())); // sort

for (Data data : dataArray) {
    String url = data.getOriginalURL(shortUrl);           // map
    if (url != null && !url.isEmpty()) {                  // filter
        System.out.println("url :" + url);            
        return url;                                   // find first
    }
}
Run Code Online (Sandbox Code Playgroud)

Stu*_*rks 12

这是一个较小的例子来说明问题:

Stream.of("a", "ab", "abc", "abcd")
    // .sorted() // uncomment and what follows becomes eager
    .filter(s -> s.contains("b"))
    .peek(s -> System.out.println("PEEK: " + s))
    .findFirst()
    .orElse("X");
Run Code Online (Sandbox Code Playgroud)

正如预期的那样,输出是:

PEEK: ab
Run Code Online (Sandbox Code Playgroud)

如果该sorted行未取消注释,则输出为:

PEEK: ab
PEEK: abc
PEEK: abcd
Run Code Online (Sandbox Code Playgroud)

(正如预期的那样,在这两种情况下,整个管道的最终结果都是"ab".)

确实,sorted在生成第一个输出元素之前必须消耗所有输入.从那个意义上来说,它是渴望的.但是,它确实会影响元素向下游发送的方式.

如果没有排序,findFirst操作会从上游"拉"元素,直到找到元素,然后停止.通过排序,sorted()操作急切地收集所有元素,对它们进行排序,并且由于它们在那里完成它们,它会将它们"推"到流中.当然,findFirst忽略除第一个元素之外的所有元素.但这意味着干预操作(例如过滤器)可能会做不必要的工作.

最终结果是正确的,但行为是出乎意料的.这可能被视为一个错误.如果合适,我会调查并提交错误.

  • 是的,性能错误.提起[JDK-8042355](https://bugs.openjdk.java.net/browse/JDK-8042355). (3认同)
  • @MuhammadHewedy很顺便.谢谢你提出这个问题. (2认同)
  • @Eran 好吧,我查看了源代码([SortedOps.java](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/jdk8-b132/src/share/classes/java/util /stream/SortedOps.java),第 301-308 行) 以查看它在排序后无条件地将所有元素推送到下游。这似乎是错误的,因为 findFirst 应该是短路的。然后我问实施它的人,他说“是的,这是一个性能错误。” :-) (2认同)

Era*_*ran 6

The sorted operation forces traversal of all the items in the stream.

Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.

Stateful operations may need to process the entire input before producing a result. For example, one cannot produce any results from sorting a stream until one has seen all elements of the stream.

(Source)

I'm not sure, though, why the operations following the sorted are also executed for all the elements in the stream.

If you perform the sort separately, and then use the stream for the rest of the processing, the processing will stop when the first match is found, as expected.

Arrays.sort(dataArray, Comparator.comparing(d -> d.getPriority())); // sort

Arrays.stream(dataArray)
.peek(o -> System.out.println("SORT: " + o))
.map(d -> d.getOriginalURL(shortUrl))
.peek(o -> System.out.println("MAP: " + o))
.filter(u -> u != null && !u.isEmpty())
.peek(o -> System.out.println("FILTER: " + o))
.findFirst().orElse("");
Run Code Online (Sandbox Code Playgroud)