使用String方法拆分具有多个分隔符的字符串

Adi*_*mar 13 java tokenize

我想将一个字符串拆分成标记.

我撕掉了另一个Stack Overflow问题 - 相当于带有多个字符分隔符的StringTokenizer,但我想知道是否只能使用字符串方法(.equals(),. startSith()等).我不想使用RegEx,StringTokenizer类,模式,匹配器或其他任何东西String.

例如,这就是我想要调用方法的方式

String[] delimiters = {" ", "==", "=", "+", "+=", "++", "-", "-=", "--", "/", "/=", "*", "*=", "(", ")", ";", "/**", "*/", "\t", "\n"};
        String splitString[] = tokenizer(contents, delimiters);
Run Code Online (Sandbox Code Playgroud)

这是我扯掉另一个问题的代码(我不想这样做).

    private String[] tokenizer(String string, String[] delimiters) {
        // First, create a regular expression that matches the union of the
        // delimiters
        // Be aware that, in case of delimiters containing others (example &&
        // and &),
        // the longer may be before the shorter (&& should be before &) or the
        // regexpr
        // parser will recognize && as two &.
        Arrays.sort(delimiters, new Comparator<String>() {
            @Override
            public int compare(String o1, String o2) {
                return -o1.compareTo(o2);
            }
        });
        // Build a string that will contain the regular expression
        StringBuilder regexpr = new StringBuilder();
        regexpr.append('(');
        for (String delim : delimiters) { // For each delimiter
            if (regexpr.length() != 1)
                regexpr.append('|'); // Add union separator if needed
            for (int i = 0; i < delim.length(); i++) {
                // Add an escape character if the character is a regexp reserved
                // char
                regexpr.append('\\');
                regexpr.append(delim.charAt(i));
            }
        }
        regexpr.append(')'); // Close the union
        Pattern p = Pattern.compile(regexpr.toString());

        // Now, search for the tokens
        List<String> res = new ArrayList<String>();
        Matcher m = p.matcher(string);
        int pos = 0;
        while (m.find()) { // While there's a delimiter in the string
            if (pos != m.start()) {
                // If there's something between the current and the previous
                // delimiter
                // Add it to the tokens list
                res.add(string.substring(pos, m.start()));
            }
            res.add(m.group()); // add the delimiter
            pos = m.end(); // Remember end of delimiter
        }
        if (pos != string.length()) {
            // If it remains some characters in the string after last delimiter
            // Add this to the token list
            res.add(string.substring(pos));
        }
        // Return the result
        return res.toArray(new String[res.size()]);
    }
    public static String[] clean(final String[] v) {
        List<String> list = new ArrayList<String>(Arrays.asList(v));
        list.removeAll(Collections.singleton(" "));
        return list.toArray(new String[list.size()]);
    }
Run Code Online (Sandbox Code Playgroud)

编辑:我只想使用字符串方法charAt,equals,equalsIgnoreCase,indexOf,length和substring

Nic*_*ckJ 9

编辑:我的原始答案并没有完全解决问题,它没有在结果数组中包含分隔符,并使用String.split()方法,这是不允许的.

这是我的新解决方案,分为两种方法:

/**
 * Splits the string at all specified literal delimiters, and includes the delimiters in the resulting array
 */
private static String[] tokenizer(String subject, String[] delimiters)  { 

    //Sort delimiters into length order, starting with longest
    Arrays.sort(delimiters, new Comparator<String>() {
        @Override
        public int compare(String s1, String s2) {
          return s2.length()-s1.length();
         }
      });

    //start with a list with only one string - the whole thing
    List<String> tokens = new ArrayList<String>();
    tokens.add(subject);

    //loop through the delimiters, splitting on each one
    for (int i=0; i<delimiters.length; i++) {
        tokens = splitStrings(tokens, delimiters, i);
    }

    return tokens.toArray(new String[] {});
}

/**
 * Splits each String in the subject at the delimiter
 */
private static List<String> splitStrings(List<String> subject, String[] delimiters, int delimiterIndex) {

    List<String> result = new ArrayList<String>();
    String delimiter = delimiters[delimiterIndex];

    //for each input string
    for (String part : subject) {

        int start = 0;

        //if this part equals one of the delimiters, don't split it up any more
        boolean alreadySplit = false;
        for (String testDelimiter : delimiters) {
            if (testDelimiter.equals(part)) {
                alreadySplit = true;
                break;
            }
        }

        if (!alreadySplit) {
            for (int index=0; index<part.length(); index++) {
                String subPart = part.substring(index);
                if (subPart.indexOf(delimiter)==0) {
                    result.add(part.substring(start, index));   // part before delimiter
                    result.add(delimiter);                      // delimiter
                    start = index+delimiter.length();           // next parts starts after delimiter
                }
            }
        }
        result.add(part.substring(start));                      // rest of string after last delimiter          
    }
    return result;
}
Run Code Online (Sandbox Code Playgroud)

原始答案

Pattern当你说你只想使用String方法时,我注意到你正在使用.

我将采取的方法是考虑最简单的方法.我认为这是首先用一个分隔符替换所有可能的分隔符,然后进行分割.

这是代码:

private String[] tokenizer(String string, String[] delimiters)  {       

    //replace all specified delimiters with one
    for (String delimiter : delimiters) {
        while (string.indexOf(delimiter)!=-1) {
            string = string.replace(delimiter, "{split}");
        }
    }

    //now split at the new delimiter
    return string.split("\\{split\\}");

}
Run Code Online (Sandbox Code Playgroud)

我需要使用String.replace()而不是String.replaceAll()因为replace()需要文本文本并replaceAll()采用正则表达式参数,并且提供的分隔符是文字文本.

这就是为什么我还需要一个while循环来替换每个分隔符的所有实例.