字符串流没有排序？

Question

字符串流没有排序？

Don*_*Don 4 java sorting java-8 java-stream

我想在文件中找到所有单词的集合.这个集合应该排序.大小写无关紧要.这是我的方法:

public static Set<String> setOfWords(String fileName) throws IOException {

    Set<String> wordSet;
    Stream<String> stream = java.nio.file.Files.lines(java.nio.file.Paths.get(fileName));

    wordSet = stream
                .map(line -> line.split("[ .,;?!.:()]"))
                .flatMap(Arrays::stream)
                .sorted()
                .map(String::toLowerCase)
                .collect(Collectors.toSet());
    stream.close();
    return wordSet;
}

Run Code Online (Sandbox Code Playgroud)

测试文件:

这是一个有五行的文件.它有两个句子,word文件包含在这个文件的多行中.这个文件可以用来测试吗？

打印集时,我得到以下输出:

Set of words: 
a
be
in
sentences
testing
this
for
multiple
is
it
used
two
the
can
with
contained
file
and
of
has
lines
five
word

Run Code Online (Sandbox Code Playgroud)

任何人都可以告诉我,为什么这个集合没有按照它的自然顺序排序(对于Strings lexiographic)？

提前致谢

Answer 1

Sle*_*idi 7

您可以使用分类收集像TreeSet使用String.CASE_INSENSITIVE_ORDER作为Comparator

Set<String> set = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .collect(Collectors.toCollection(()-> new TreeSet<>(String.CASE_INSENSITIVE_ORDER)));

Run Code Online (Sandbox Code Playgroud)

或者,您可以使用不区分大小写的比较器对元素进行排序,并将其收集到维护插入顺序的集合中.

List<String> list = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .sorted(String::compareToIgnoreCase)
            .distinct()
            .collect(Collectors.toList());

Run Code Online (Sandbox Code Playgroud)

在第二个示例中,您可以添加`distinct()`以从列表中删除重复项. (2认同)

Answer 2

Era*_*ran 5

由于排序区分大小写,因此在排序之前应映射到小写.

除此之外,您应该将输出收集到有序集合中,例如一个List或一些SortedSet实现(尽管如果您使用SortedSet不需要执行sorted(),因为Set无论如何都会对它进行排序).

一个List输出:

List<String> wordSet = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .map(String::toLowerCase)
            .sorted()
            .collect(Collectors.toList());

Run Code Online (Sandbox Code Playgroud)

编辑:

正如评论汉克,如果你想消除输出重复Collection,一个List不会做,所以你必须收集要素引入SortedSet实施.

一个SortedSet输出:

Set<String> wordSet = stream
            .map(line -> line.split("[ .,;?!.:()]"))
            .flatMap(Arrays::stream)
            .map(String::toLowerCase)
            .collect(Collectors.toCollection(TreeSet::new));

Run Code Online (Sandbox Code Playgroud)

谢谢你我现在以这种方式解决了它,我完美地工作:SortedSet <String> wordSet; Stream <String> stream = java.nio.file.Files.lines(java.nio.file.Paths.get(fileName)); wordSet = stream .map(line - > line.split("[.,;？!.:()]")).flatMap(Arrays :: stream).map(String :: toLowerCase).collect(Collectors.toCollection) (TreeSet中::新)); (3认同)

归档时间：	9 年，9 月前
查看次数：	1187 次
最近记录：	9 年，9 月前