令人困惑的String.split输出

san*_*tel 51 java regex string class core

我不明白这段代码的输出:

public class StringDemo{              
    public static void main(String args[]) {
        String blank = "";                    
        String comma = ",";                   
        System.out.println("Output1: "+blank.split(",").length);  
        System.out.println("Output2: "+comma.split(",").length);  
    }
}
Run Code Online (Sandbox Code Playgroud)

得到以下输出:

Output1: 1 
Output2: 0
Run Code Online (Sandbox Code Playgroud)

Mar*_*rno 54

文档:

对于: System.out.println("Output1: "+blank.split(",").length);

此方法返回的数组包含此字符串的每个子字符串,该子字符串由与给定表达式匹配的另一个子字符串终止,或者由字符串的结尾终止.数组中的子串按它们在此字符串中出现的顺序排列.如果表达式与输入的任何部分都不匹配,那么结果数组只有一个元素,即该字符串.

它只会返回整个字符串,这就是它返回1的原因.


对于第二种情况,String.split将丢弃,所以结果将为空.

String.split silently discards trailing separators
Run Code Online (Sandbox Code Playgroud)

看看guava Strings解释

  • 单参数拆分方法的Javadoc说:"此方法就像通过调用具有给定表达式和限制参数为零的双参数split方法一样工作.**因此,尾随空字符串不包含在结果数组中.**"这是对第二个结果的正确解释.两个尾随空字符串被排除在外. (13认同)
  • 是的,理论上一切都在doc.但是我总是想知道他们从哪里得到那些你可以阅读10次他们所写内容的人,但你仍然需要编写一个测试程序来了解该方法实际上在做什么...... (6认同)

Psh*_*emo 34

一切都按照计划进行,但让我们一步一步地做(我希望你有一些时间).

根据方法的文档(和源代码)split(String regex):

此方法的工作方式就像调用带有给定表达式和limit参数为零的双参数split方法一样.

所以当你调用时

split(String regex)
Run Code Online (Sandbox Code Playgroud)

你实际上从以下split(String regex, int limit)方式调用的方法获得结果:

split(regex, 0)
Run Code Online (Sandbox Code Playgroud)

所以这里limit设置为0.

您需要了解有关此参数的一些信息:

  • 如果limit是肯定的,"axaxaxaxa".split("x",2)则将结果数组的长度限制为您指定的正数,因此将返回数组["a", "axaxaxa"],而不是["a","a","a","a","a"].
  • 如果limit是,0那么您不限制结果数组的长度.但它也意味着将删除任何尾随的空字符串.例如:

    "fooXbarX".split("X")
    
    Run Code Online (Sandbox Code Playgroud)

    将在开始时生成一个看起来像这样的数组:

    ["foo", "bar", ""]
    
    Run Code Online (Sandbox Code Playgroud)

    ("barX"拆分"X"生成"bar"""),但由于split删除了所有尾随空字符串,它将返回

    ["foo", "bar"]
    
    Run Code Online (Sandbox Code Playgroud)
  • 负值的limit行为类似于限制设置为的行为0(它不会限制结果数组的长度).唯一的区别是它不会从结果数组的末尾删除空字符串.换一种说法

    "fooXbarX".split("X",-1)
    
    Run Code Online (Sandbox Code Playgroud)

将返回 ["foo", "bar", ""]


让我们来看看这个案子,

",".split(",").length
Run Code Online (Sandbox Code Playgroud)

其中(如前所述)与...相同

",".split(",", 0).length
Run Code Online (Sandbox Code Playgroud)

这意味着,我们使用的是版本分割这将不会限制结果阵列的长度的,但会删除所有尾随空字符串,"".你需要明白,当我们分开件事时,我们总是得到件事.

换句话说,如果我们分开"abc"代替b,我们将得到"a""c".
棘手的部分是要明白,如果我们分裂"abc",c我们将得到"ab"""(空字符串).

使用这个逻辑,如果我们分开",",,我们将得到""""(两个空字符串).

您可以使用split负限制来检查它:

for (String s: ",".split(",", -1)){
    System.out.println("\""+s+"\"");
}
Run Code Online (Sandbox Code Playgroud)

这将打印

""
""
Run Code Online (Sandbox Code Playgroud)

因此,我们首先看到结果数组["", ""].

但是因为默认情况下我们使用limitset to 0,所有尾随的空字符串都将被删除.在这种情况下,结果数组包含尾随空字符串,因此将删除所有这些字符串,留下[]具有长度的空数组0.


用这个案子来回答

"".split(",").length
Run Code Online (Sandbox Code Playgroud)

你需要明白,删除尾随的空字符串只有在这样的尾随空字符串能够分割时才会有意义(并且很可能不需要).
因此,如果没有我们可以分割的任何地方,就不可能创建空字符串,因此运行此"清理"过程没有意义.

This information is mentioned in documentation of split(String regex, int limit) method where you can read:

If the expression does not match any part of the input then the resulting array has just one element, namely this string.

You can also see this behaviour in source code of this method (from Java 8):

2316      public String[] split(String regex, int limit) {
2317 /* fastpath if the regex is a
2318 (1)one-char String and this character is not one of the
2319 RegEx's meta characters ".$|()[{^?*+\\", or
2320 (2)two-char String and the first char is the backslash and
2321 the second is not the ascii digit or ascii letter.
2322 */
2323 char ch = 0;
2324 if (((regex.value.length == 1 &&
2325 ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
2326 (regex.length() == 2 &&
2327 regex.charAt(0) == '\\' &&
2328 (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
2329 ((ch-'a')|('z'-ch)) < 0 &&
2330 ((ch-'A')|('Z'-ch)) < 0)) &&
2331 (ch < Character.MIN_HIGH_SURROGATE ||
2332 ch > Character.MAX_LOW_SURROGATE))
2333 {
2334 int off = 0;
2335 int next = 0;
2336 boolean limited = limit > 0;
2337 ArrayList<String> list = new ArrayList<>();
2338 while ((next = indexOf(ch, off)) != -1) {
2339 if (!limited || list.size() < limit - 1) {
2340 list.add(substring(off, next));
2341 off = next + 1;
2342 } else { // last one
2343 //assert (list.size() == limit - 1);
2344 list.add(substring(off, value.length));
2345 off = value.length;
2346 break;
2347 }
2348 }
2349 // If no match was found, return this
2350 if (off == 0)
2351 return new String[]{this};
2353 // Add remaining segment
2354 if (!limited || list.size() < limit)
2355 list.add(substring(off, value.length));
2357 // Construct result
2358 int resultSize = list.size();
2359 if (limit == 0) {
2360 while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
2361 resultSize--;
2362 }
2363 }
2364 String[] result = new String[resultSize];
2365 return list.subList(0, resultSize).toArray(result);
2366 }
2367 return Pattern.compile(regex).split(this, limit);
2368 }

where you can find

if (off == 0)
    return new String[]{this};
Run Code Online (Sandbox Code Playgroud)

fragment which means

  • if (off == 0) - if off (position from which method should start searching for next possible match for regex passed as split argument) is still 0 after iterating over entire string, we didn't find any match, so the string was not split
  • return new String[]{this}; - in that case let's just return an array with original string (represented by this).

Since "," couldn't be found in "" even once, "".split(",") must return an array with one element (empty string on which you invoked split). This means that the length of this array is 1.

BTW. Java 8 introduced another mechanism. It removes leading empty strings (if they ware created while splitting process) if we split using zero-length regex (like "" or with look-around (?<!x)). More info at: Why in Java 8 split sometimes removes empty strings at start of result array?

  • @Bob当你将鼠标悬停在grepcode的行号上时,你会看到`<>`.单击它时,您将打开一个框,您可以在其中指定要作为HTML代码获取的行范围. (2认同)

Nav*_*kar 7

来自Java 1.7文档

围绕给定正则表达式的匹配拆分字符串.

split()方法的工作方式就像调用带有给定表达式和limit参数为零的双参数split方法一样.因此,结尾的空字符串不包含在结果数组中.

在案例1中 blank.split(",") does not match any part of the input then the resulting array has just one element, namely this String.

It will return entire String.所以,长度将是1.

在案例2中 comma.split(",") will return empty.

split() 期望正则表达式作为参数,返回结果数组以匹配该正则表达式.

所以,长度是 0

例如(文档)

字符串"boo:and:foo"使用以下表达式产生以下结果:

Regex     Result
  :     { "boo", "and", "foo" }
  o     { "b", "", ":and:f" }
Run Code Online (Sandbox Code Playgroud)

参数: regex - 分隔正则表达式

返回: 通过围绕给定正则表达式的匹配拆分此字符串计算的字符串数组

抛出: PatternSyntaxException - 如果正则表达式的语法无效


Ruc*_*era 1

String blank = "";                    
String comma = ",";                   
System.out.println("Output1: "+blank.split(",").length);  // case 1
System.out.println("Output2: "+comma.split(",").length);  // case 2
Run Code Online (Sandbox Code Playgroud)

情况 1 - 这里blank.split(",")将返回,因为你""没有得到相同的,所以长度将是,blank1

情况2-这里将返回空数组,如果你想计算长度,comma.split(",")你必须转义,否则长度将是,comma10

再次comma.split(",")split() 期望 aregex作为参数,它将返回与该参数匹配的结果数组regex

此方法返回的数组包含此字符串的每个子字符串,该子字符串由与给定表达式匹配的另一个子字符串终止或由字符串末尾终止。

别的

如果表达式与输入的任何部分都不匹配,则结果数组只有一个元素,即该字符串。