使用正则表达式在多个HTML标记之间获取文本

ben*_*ben 8 html c# regex

使用正则表达式,我希望能够在多个DIV标记之间获取文本.例如,以下内容:

<div>first html tag</div>
<div>another tag</div>
Run Code Online (Sandbox Code Playgroud)

输出:

first html tag
another tag
Run Code Online (Sandbox Code Playgroud)

我使用的正则表达式模式只匹配我的最后一个div标签并错过了第一个.码:

    static void Main(string[] args)
    {
        string input = "<div>This is a test</div><div class=\"something\">This is ANOTHER test</div>";
        string pattern = "(<div.*>)(.*)(<\\/div>)";

        MatchCollection matches = Regex.Matches(input, pattern);
        Console.WriteLine("Matches found: {0}", matches.Count);

        if (matches.Count > 0)
            foreach (Match m in matches)
                Console.WriteLine("Inner DIV: {0}", m.Groups[2]);

        Console.ReadLine();
    }
Run Code Online (Sandbox Code Playgroud)

输出:

匹配发现:1

内部DIV:这是另一个测试

coo*_*ine 14

用非贪婪的比赛替换你的模式

static void Main(string[] args)
{
    string input = "<div>This is a test</div><div class=\"something\">This is ANOTHER test</div>";
    string pattern = "<div.*?>(.*?)<\\/div>";

    MatchCollection matches = Regex.Matches(input, pattern);
    Console.WriteLine("Matches found: {0}", matches.Count);

    if (matches.Count > 0)
        foreach (Match m in matches)
            Console.WriteLine("Inner DIV: {0}", m.Groups[1]);

    Console.ReadLine();
}
Run Code Online (Sandbox Code Playgroud)


Meh*_*ani 8

正如其他人没有提到的HTML tags with attributes,这是我的解决方案:

// <TAG(.*?)>(.*?)</TAG>
// Example
var regex = new System.Text.RegularExpressions.Regex("<h1(.*?)>(.*?)</h1>");
var m = regex.Match("Hello <h1 style='color: red;'>World</h1> !!");
Console.Write(m.Groups[2].Value); // will print -> World
Run Code Online (Sandbox Code Playgroud)