匹配日期的正则表达式(月日、年或 m/d/yy)

Ted*_*edS 5 c# regex regex-group regex-greedy regex-lookarounds

我正在尝试编写一个正则表达式,该表达式可用于在字符串中查找日期,该字符串前面(或后面)可能有空格、数字、文本、行尾等。该表达式应处理美国日期格式要么

1) Month Name Day, Year - 即 2019 年 1 月 10 日或
2) mm/dd/yy - 即 11/30/19

我为月份名称,年份找到了这个

(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}
Run Code Online (Sandbox Code Playgroud)

(感谢 Veverke 在这里Regex 匹配日期,如月份名称日逗号和年份

这对于 mm/dd/yy(以及 m/d/y 的各种组合)

(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2} 
Run Code Online (Sandbox Code Playgroud)

(在此感谢 Steven Levithan 和 Jan Goyvaerts https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html

我试图把它们像这样结合起来

((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})
Run Code Online (Sandbox Code Playgroud)

当我在输入字符串“Paid on 1/1/2019”中搜索“on [regex above]”时,它确实找到了日期,但没有找到“on”这个词。如果我只是使用,则找到该字符串

(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}
Run Code Online (Sandbox Code Playgroud)

谁能看到我做错了什么?

编辑

我正在使用下面的 c# .net 代码:

    string stringToSearch = "Paid on 1/1/2019";
    string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
    var match = Regex.Match(stringToSearch, searchPattern, RegexOptions.IgnoreCase);


    string foundString;
    if (match.Success)
        foundString= stringToSearch.Substring(match.Index, match.Length);
Run Code Online (Sandbox Code Playgroud)

例如

string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
stringToSearch = "Paid on Jan 1, 2019";
found = "on Jan 1, 2019" -- worked as expected, found the word "on" and the date

stringToSearch = "Paid on 1/1/2019";
found = "1/1/2019"  -- did not work as expected, found the date but did not include the word "on"
Run Code Online (Sandbox Code Playgroud)

如果我反转模式

string searchPattern = @"on ((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})"";

stringToSearch = "Paid on Jan 1, 2019";
found = "Jan 1, 2019" -- did not work as expected, found the date but did not include the word "on"

stringToSearch = "Paid on 1/1/2019";
found = "on 1/1/2019" -- worked as expected, found the word "on" and the date
Run Code Online (Sandbox Code Playgroud)

谢谢

Emm*_*mma 3

看来你们俩的表情都很到位。如果您希望捕获目标输出之前或之后的任何内容,您只需在左侧和右侧添加两个边界即可。例如,请看这个测试

(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)
Run Code Online (Sandbox Code Playgroud)

例如,您可以添加两个类似于的组,(.*)并将原始表达式包装在一组中,这样就可以了。

在此输入图像描述

正则表达式描述图

该图直观地显示了表达式的工作原理,您可能想测试此链接中的其他表达式:

在此输入图像描述

C# 测试

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)";
        string input = @"Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

JavaScript 演示

此 JavaScript 演示表明您的表达式有效:

(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)
Run Code Online (Sandbox Code Playgroud)

基本性能测试

此 JavaScript 代码片段返回 100 万次循环的运行时间for以提高性能。

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)";
        string input = @"Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

改进

您可能希望减少围绕月份名称的捕获组,并且如果您愿意,可以将所有这些捕获组简单地添加到一个捕获组中。