数字范围的正则表达式生成器

arc*_*rcy 12 regex algorithm

我检查了stackExchange描述,算法问题是允许的主题之一.所以这里.

给定一个范围的输入,其中开始和结束的数字具有相同的位数(例如,2,3或4),我想编写代码来生成一组正则表达式,当依次对数字进行检查时告诉我这个数字是否在原始范围内.

例如:如果范围是145-387,那么146,200和280都将匹配所生成的正则表达式之一,而144,390(过去就是290)和445(过去就是345)则不会.

我一直在想结果将是一个正则表达式列表,如:

14[5-9]             // match 145-149
1[5-9]0-9]          // 150-199
2[0-9][0-9]         // 200-299
3[0-7][0-9]         // 300-379
38[0-7]             // 380-387
Run Code Online (Sandbox Code Playgroud)

然后软件检查该号码将测试以查看被测试的3位数代码是否与这些中的任何一个匹配.

那么生成表达式集的最佳方法是什么?

我提出的最新(系列)是:

  1. 确定两个范围编号不同的第一个数字(1145-1158,第一个不同的数字是第3个)
  2. 对于不同的数字,确定它们的第一个数字是否相差多于一个 - 如果是这样,那么它的范围得到它自己的正则表达式(在我们的例子中为200-299)
  3. 获得更低的范围:对于每个其他数字:从范围的开头开始的第一个数字的前缀,将数字增加1,将0s填充到相同的长度,并与数字中的数字9对应地方和所有填充的地方.在我们的例子中,增加4到5,pad得到150,生成正则表达式来处理150-199.
  4. 获得更高的范围:对于每个其他数字:从范围结束开始的第一个数字的前缀,逐个数字递减,0秒的填充休息,在所有填充的0个位置和减少的数字与9s的数字配对.在我们的例子中,正则表达式处理300-379.

我错过了什么吗?即使在上面我还有一些细节,我正在掩饰,似乎有些东西可以从算法剑中掠过细节.但我提出的其他事情甚至比这更糟糕.

bez*_*max 11

这是我的解决方案和具有复杂性的算法O(log n)(n是范围的结束).我相信这是最简单的一个:

基本上,将您的任务分成以下步骤:

  1. 逐渐"削弱" start范围.
  2. 逐渐"削弱" end范围.
  3. 合并这两个.

通过"弱化",我的意思是找到可以用这个特定数字的简单正则表达式表示的范围的结尾,例如:

145 -> 149,150 -> 199,200 -> 999,1000 -> etc.
Run Code Online (Sandbox Code Playgroud)

这是一个向后end的范围:

387 -> 380,379 -> 300,299 -> 0
Run Code Online (Sandbox Code Playgroud)

合并将是注意299-> 0和200-> 999的重叠并将它们组合成200-> 299的过程.

结果,你会得到这组数字(第一个列表完整,第二个反转:

145, 149, 150, 199, 200, 299, 300, 379, 380, 387
Run Code Online (Sandbox Code Playgroud)

现在,这是有趣的部分.成对获取数字,并将它们转换为范围:

145-149, 150-199, 200-299, 300-379, 380-387
Run Code Online (Sandbox Code Playgroud)

或者在正则表达式中:

14[5-9], 1[5-9][0-9], 2[0-9][0-9], 3[0-7][0-9], 38[0-7]
Run Code Online (Sandbox Code Playgroud)

这是代码的weakening样子:

public static int next(int num) {
    //Convert to String for easier operations
    final char[] chars = String.valueOf(num).toCharArray();
    //Go through all digits backwards
    for (int i=chars.length-1; i>=0;i--) {
        //Skip the 0 changing it to 9. For example, for 190->199
        if (chars[i]=='0') {
            chars[i] = '9';
        } else { //If any other digit is encountered, change that to 9, for example, 195->199, or with both rules: 150->199
            chars[i] = '9';
            break;
        }
    }

    return Integer.parseInt(String.valueOf(chars));
}

//Same thing, but reversed. 387 -> 380, 379 -> 300, etc
public static int prev(int num) {
    final char[] chars = String.valueOf(num).toCharArray();
    for (int i=chars.length-1; i>=0;i--) {
        if (chars[i] == '9') {
            chars[i] = '0';
        } else {
            chars[i] = '0';
            break;
        }
    }

    return Integer.parseInt(String.valueOf(chars));
}
Run Code Online (Sandbox Code Playgroud)

其余的是技术细节,易于实施.以下是此O(log n)算法的实现:https://ideone.com/3SCvZf

哦,顺便说一下,它也适用于其他范围,例如对于范围1-321654,结果是:

[1-9]
[1-9][0-9]
[1-9][0-9][0-9]
[1-9][0-9][0-9][0-9]
[1-9][0-9][0-9][0-9][0-9]
[1-2][0-9][0-9][0-9][0-9][0-9]
3[0-1][0-9][0-9][0-9][0-9]
320[0-9][0-9][0-9]
321[0-5][0-9][0-9]
3216[0-4][0-9]
32165[0-4]
Run Code Online (Sandbox Code Playgroud)

对于129-131它:

129
13[0-1]
Run Code Online (Sandbox Code Playgroud)


arc*_*rcy 5

我终于到达了下面。总体思路是,从范围的开头开始,生成一个正则表达式,该表达式从该范围开始一直匹配,但不包括10的下一个倍数,然后是数百个,依此类推,直到您将所有内容匹配到末尾为止。范围; 然后从范围的末尾开始,然后向下进行操作,将递增的数字替换为0,以与相似的9s进行匹配,以匹配特定范围的末尾。如果尚未覆盖全部范围,则为该范围的一部分生成一个正则表达式。

应该特别注意bezmax的例程,该例程将两个数字转换为匹配它们的正则表达式-我认为比直接处理字符串或字符数组要容易得多。

无论如何,这里是:

package numbers;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;

/**
 * Has methods for generating regular expressions to match ranges of numbers.
 */
public class RangeRegexGenerator
{
  public static void main(String[] args)
  {
    RangeRegexGenerator rrg = new RangeRegexGenerator();

//    do
//    {
//      Scanner scanner = new Scanner(System.in);
//      System.out.println("enter start, <return>, then end and <return>");
//      int start = scanner.nextInt();
//      int end = scanner.nextInt();
//      System.out.println(String.format("for %d-%d", start, end));

      List<String> regexes = rrg.getRegex("0015", "0213");
      for (String s: regexes) { System.out.println(s); }
//    } 
//    while(true);
  }

  /**
   * Return a list of regular expressions that match the numbers
   * that fall within the range of the given numbers, inclusive.
   * Assumes the given strings are numbers of the the same length,
   * and 0-left-pads the resulting expressions, if necessary, to the
   * same length. 
   * @param begStr
   * @param endStr
   * @return
   */
  public List<String> getRegex(String begStr, String endStr)
  {
      int start = Integer.parseInt(begStr);
      int end   = Integer.parseInt(endStr);
      int stringLength = begStr.length();
      List<Integer> pairs = getRegexPairs(start, end);
      List<String> regexes = toRegex(pairs, stringLength);
      return regexes;
  }

  /**
   * Return a list of regular expressions that match the numbers
   * that fall within the range of the given numbers, inclusive.
   * @param beg
   * @param end
   * @return
   */
  public List<String> getRegex(int beg, int end)
  {
    List<Integer> pairs = getRegexPairs(beg, end);
    List<String> regexes = toRegex(pairs);
    return regexes;
  }

  /**
   * return the list of integers that are the paired integers
   * used to generate the regular expressions for the given
   * range. Each pair of integers in the list -- 0,1, then 2,3,
   * etc., represents a range for which a single regular expression
   * is generated.
   * @param start
   * @param end
   * @return
   */
  private List<Integer> getRegexPairs(int start, int end)
  {
      List<Integer> pairs = new ArrayList<>();

      ArrayList<Integer> leftPairs = new ArrayList<>();
      int middleStartPoint = fillLeftPairs(leftPairs, start, end);
      ArrayList<Integer> rightPairs = new ArrayList<>();
      int middleEndPoint = fillRightPairs(rightPairs, middleStartPoint, end);

      pairs.addAll(leftPairs);
      if (middleEndPoint > middleStartPoint)
      {
        pairs.add(middleStartPoint);
        pairs.add(middleEndPoint);
      }
      pairs.addAll(rightPairs);
      return pairs;
  }

  /**
   * print the given list of integer pairs - used for debugging.
   * @param list
   */
  @SuppressWarnings("unused")
  private void printPairList(List<Integer> list)
  {
    if (list.size() > 0)
    {
      System.out.print(String.format("%d-%d", list.get(0), list.get(1)));
      int i = 2;
      while (i < list.size())
      {
        System.out.print(String.format(", %d-%d", list.get(i), list.get(i + 1)));
        i = i + 2;
      }
      System.out.println();
    }
  }

  /**
   * return the regular expressions that match the ranges in the given
   * list of integers. The list is in the form firstRangeStart, firstRangeEnd, 
   * secondRangeStart, secondRangeEnd, etc.
   * @param pairs
   * @return
   */
  private List<String> toRegex(List<Integer> pairs)
  {
    return toRegex(pairs, 0);
  }

  /**
   * return the regular expressions that match the ranges in the given
   * list of integers. The list is in the form firstRangeStart, firstRangeEnd, 
   * secondRangeStart, secondRangeEnd, etc. Each regular expression is 0-left-padded,
   * if necessary, to match strings of the given width.
   * @param pairs
   * @param minWidth
   * @return
   */
  private List<String> toRegex(List<Integer> pairs, int minWidth)
  {
    List<String> list = new ArrayList<>();
    String numberWithWidth = String.format("%%0%dd", minWidth);
    for (Iterator<Integer> iterator = pairs.iterator(); iterator.hasNext();)
    {
      String start = String.format(numberWithWidth, iterator.next()); // String.valueOf(iterator.next());
      String end = String.format(numberWithWidth, iterator.next());

      list.add(toRegex(start, end));
    }
    return list;
  }

  /**
   * return a regular expression string that matches the range
   * with the given start and end strings.
   * @param start
   * @param end
   * @return
   */
  private String toRegex(String start, String end)
  {
    assert start.length() == end.length();

    StringBuilder result = new StringBuilder();

    for (int pos = 0; pos < start.length(); pos++)
    {
      if (start.charAt(pos) == end.charAt(pos))
      {
        result.append(start.charAt(pos));
      } else
      {
        result.append('[').append(start.charAt(pos)).append('-')
            .append(end.charAt(pos)).append(']');
      }
    }
    return result.toString();
  }

  /**
   * Return the integer at the end of the range that is not covered
   * by any pairs added to the list.
   * @param rightPairs
   * @param start
   * @param end
   * @return
   */
  private int fillRightPairs(List<Integer> rightPairs, int start, int end)
  {
    int firstBeginRange = end;    // the end of the range not covered by pairs
                                  // from this routine.
    int y = end;
    int x = getPreviousBeginRange(y);

    while (x >= start)
    {
      rightPairs.add(y);
      rightPairs.add(x);
      y = x - 1;
      firstBeginRange = y;
      x = getPreviousBeginRange(y);
    }
    Collections.reverse(rightPairs);
    return firstBeginRange;
  }

  /**
   * Return the integer at the start of the range that is not covered
   * by any pairs added to its list. 
   * @param leftInts
   * @param start
   * @param end
   * @return
   */
  private int fillLeftPairs(ArrayList<Integer> leftInts, int start, int end)
  {
    int x = start;
    int y = getNextLeftEndRange(x);

    while (y < end)
    {
      leftInts.add(x);
      leftInts.add(y);
      x = y + 1;
      y = getNextLeftEndRange(x);
    }
    return x;
  }

  /**
   * given a number, return the number altered such
   * that any 9s at the end of the number remain, and
   * one more 9 replaces the number before the other
   * 9s.
   * @param num
   * @return
   */
  private int getNextLeftEndRange(int num)
  {
    char[] chars = String.valueOf(num).toCharArray();
    for (int i = chars.length - 1; i >= 0; i--)
    {
      if (chars[i] == '0')
      {
        chars[i] = '9';
      } else
      {
        chars[i] = '9';
        break;
      }
    }

    return Integer.parseInt(String.valueOf(chars));
  }

  /**
   * given a number, return the number altered such that
   * any 9 at the end of the number is replaced by a 0,
   * and the number preceding any 9s is also replaced by
   * a 0.
   * @param num
   * @return
   */
  private int getPreviousBeginRange(int num)
  {
    char[] chars = String.valueOf(num).toCharArray();
    for (int i = chars.length - 1; i >= 0; i--)
    {
      if (chars[i] == '9')
      {
        chars[i] = '0';
      } else
      {
        chars[i] = '0';
        break;
      }
    }

    return Integer.parseInt(String.valueOf(chars));
  }
}
Run Code Online (Sandbox Code Playgroud)

就我所能测试的而言,这是正确的。bezmax发布的那个并不太奏效,尽管他对整个算法有正确的想法(我也想出了),并且有一个或两个主要的实现细节很有用,所以我留下了“答案”复选标记他的回应。

我对这引起的兴趣感到有些惊讶,尽管不及问题的复杂程度如何。