拆分java.util.stream.Stream

Question

拆分java.util.stream.Stream

我有一个包含URL和电子邮件的文本文件.我需要从文件中提取所有这些内容.每个URL和电子邮件可以找到一次以上,但结果不应包含重复项.我可以使用以下代码提取所有URL:

Files.lines(filePath).
    .map(urlPattern::matcher)
    .filter(Matcher::find)
    .map(Matcher::group)
    .distinct();

Run Code Online (Sandbox Code Playgroud)

我可以使用以下代码提取所有电子邮件:

Files.lines(filePath).
    .map(emailPattern::matcher)
    .filter(Matcher::find)
    .map(Matcher::group)
    .distinct();

Run Code Online (Sandbox Code Playgroud)

我是否可以提取所有Files.lines(filePath)只读取一次返回的流的URL和电子邮件？类似于将行流分割为URL流和电子邮件流的东西.

Answer 1

Tag*_*eev 10

你可以使用partitioningBy收藏家,虽然它仍然不是很优雅的解决方案.

Map<Boolean, List<String>> map = Files.lines(filePath)
        .filter(str -> urlPattern.matcher(str).matches() ||
                       emailPattern.matcher(str).matches())
        .distinct()
        .collect(Collectors.partitioningBy(str -> urlPattern.matcher(str).matches()));
List<String> urls = map.get(true);
List<String> emails = map.get(false);

Run Code Online (Sandbox Code Playgroud)

如果您不想两次应用regexp,可以使用中间对对象(例如SimpleEntry):

public static String classify(String str) {
    return urlPattern.matcher(str).matches() ? "url" : 
        emailPattern.matcher(str).matches() ? "email" : null;
}

Map<String, Set<String>> map = Files.lines(filePath)
        .map(str -> new AbstractMap.SimpleEntry<>(classify(str), str))
        .filter(e -> e.getKey() != null)
        .collect(Collectors.groupingBy(e -> e.getKey(),
            Collectors.mapping(e -> e.getValue(), Collectors.toSet())));

Run Code Online (Sandbox Code Playgroud)

使用我的免费StreamEx库,最后一步将更短:

Map<String, Set<String>> map = StreamEx.of(Files.lines(filePath))
        .mapToEntry(str -> classify(str), Function.identity())
        .nonNullKeys()
        .grouping(Collectors.toSet());

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，9 月前
查看次数：	1241 次
最近记录：	10 年，9 月前