意外的 Collections.sort 行为测试失败

hot*_*oup 3 java sorting collections junit

请注意:我在这里提到了 JUnit 并提供了一个使用它的SSCCE代码示例,但这实际上是一个 Java 集合问题,任何有 Java 经验的人都可以回答,无论他们使用 JUnit 的经验如何。


Java 8 在这里,我正在尝试对字符串列表进行排序,但我从中得到了一些意想不到的行为Collections.sort(myList),我想知道发生了什么。

这是我的完整单元测试:

@RunWith(MockitoJUnitRunner.class)
public class SorterTest {

    @Test
    public void should_sort_correctly_including_capitalization_rules() {

        // given
        String[] actualNames = new String[] {
            "DCME",
            "CCME",
            "ACME",
            "BCME",
            "AGME",
            "AACME",
            "aCME",
            "Acme",
            "AaCME",
            "aACME",
        };
        List<String> actual = Arrays.asList(actualNames);

        // the order I would *expect* them to sort into...
        String[] expectedNames = new String[] {
                "aACME",
                "aCME",
                "AaCME",
                "AACME",
                "Acme",
                "ACME",
                "AGME",
                "BCME",
                "CCME",
                "DCME"
        };
        List<String> expected = Arrays.asList(expectedNames);

        // when
        Collections.sort(actual);

        // then
        assertTrue(actual.equals(expected));

    }

}
Run Code Online (Sandbox Code Playgroud)

assertTrue这里的 JUnit在运行时失败,因为actual列表被排序为:

0 = "AACME"
1 = "ACME"
2 = "AGME"
3 = "AaCME"
4 = "Acme"
5 = "BCME"
6 = "CCME"
7 = "DCME"
8 = "aACME"
9 = "aCME"
Run Code Online (Sandbox Code Playgroud)

那是 ^^^ 调试器输出,数字代表每个元素的列表索引。

因此,出于某种原因Collections.sort,字符串“BCME”在词典上比“aCME”“低”(将在排序列表中更早出现),这对我来说简直就是疯子。:-)

我应该提到,我将在这里只处理 UTF-8 中的 ASCII 字符,但我的应用程序将执行预验证,以确保我们每个字符串/名称中的所有字符都在[a-z][A-Z].

无论哪种方式,我正在寻找要使用的 Java 代码的排序规则是:

  • 当我说“较低”时,我的意思是“将在排序列表中较早出现”,当我说“较高”时,我的意思是“将在排序列表中稍后出现
    • 因此我会说“3 小于 43”,因为在整数的排序列表中,3 将比 43 更早出现在该列表中,等等。
  • 小写字母是比大写; 所以“a”应该出现在“A”之前
    • 因此所有字母的顺序是 aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
  • 较短的词出现在较长的词之前,前提是它们是较长词的相同(包括大小写)子集
    • “但是”低于(在之前)“蝴蝶”
    • “蝴蝶”低于“但是”(b < B)
    • "butterfly" 低于 "bUt"(b 和 b 相同,但 u < U)

鉴于这些排序规则,我的单元测试中的列表应该排序为:

Sort Order   Reason why it comes after the last one in the list
================================================================
aACME        
aCME         1st letter is 'a' but 2nd letter is 'C' and A < C
AaCME        1st letter is 'A' and a < A
AACME        1st letter is 'A' and 2nd letter is 'A' and a < A
Acme         1st letter is 'A' but 2nd letter is 'c' and A < c
ACME         1st letter is 'A' but 2nd letter is 'C' and c < C
AGME         1st letter is 'A' but 2nd letter is 'G' and C < G
BCME         1st letter is 'B' and aA < bB
CCME         1st letter is 'C' and bB < cC
DCME         1st letter is 'D' and cC < dD
Run Code Online (Sandbox Code Playgroud)

如何更改上面的代码以便单元测试通过并且列表按我需要的方式排序?

Ale*_*nko 5

Java有RuleBasedCollator,允许自定义字符排序/排序。

在这种情况下,小写字母应该在大写字母之前,因此规则可能如下所示:

static RuleBasedCollator lowerFirst() {
    try {
        return new RuleBasedCollator(
            "< a < A < b < B < c < C < d < D < e < E < f < F < g < G < h < H < i < I < j < J < "
            + "k < K < l < L < m < M < n < N < o < O < p < P < q < Q < r < R < s < S < t < T < "
            + "u < U < w < W < x < X < y < Y < z < Z"
        );
    } catch (ParseException parsex) {
        throw new IllegalArgumentException("Failed to create lowerFirst collator", parsex);
    }
}
Run Code Online (Sandbox Code Playgroud)

测试:

String[] names = new String[] {
    "DCME",  "CCME", "ACME", "BCME",  "AGME",
    "AACME", "aCME", "Acme", "AaCME", "aACME",
};

String[] expected = new String[] {
    "aACME", "aCME", "AaCME", "AACME", "Acme",
    "ACME", "AGME", "BCME", "CCME", "DCME"
};
        
Arrays.sort(names, lowerFirst());

System.out.println("sorted:   " + Arrays.toString(names));
System.out.println("expected: " + Arrays.toString(expected));
Run Code Online (Sandbox Code Playgroud)

输出

sorted:   [aACME, aCME, AaCME, AACME, Acme, ACME, AGME, BCME, CCME, DCME]
expected: [aACME, aCME, AaCME, AACME, Acme, ACME, AGME, BCME, CCME, DCME]
Run Code Online (Sandbox Code Playgroud)