使用正则表达式,我希望能够在多个DIV标记之间获取文本.例如,以下内容:
<div>first html tag</div>
<div>another tag</div>
Run Code Online (Sandbox Code Playgroud)
输出:
first html tag
another tag
Run Code Online (Sandbox Code Playgroud)
我使用的正则表达式模式只匹配我的最后一个div标签并错过了第一个.码:
static void Main(string[] args)
{
string input = "<div>This is a test</div><div class=\"something\">This is ANOTHER test</div>";
string pattern = "(<div.*>)(.*)(<\\/div>)";
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine("Matches found: {0}", matches.Count);
if (matches.Count > 0)
foreach (Match m in matches)
Console.WriteLine("Inner DIV: {0}", m.Groups[2]);
Console.ReadLine();
}
Run Code Online (Sandbox Code Playgroud)
输出:
匹配发现:1
内部DIV:这是另一个测试
coo*_*ine 14
用非贪婪的比赛替换你的模式
static void Main(string[] args)
{
string input = "<div>This is a test</div><div class=\"something\">This is ANOTHER test</div>";
string pattern = "<div.*?>(.*?)<\\/div>";
MatchCollection matches = Regex.Matches(input, pattern);
Console.WriteLine("Matches found: {0}", matches.Count);
if (matches.Count > 0)
foreach (Match m in matches)
Console.WriteLine("Inner DIV: {0}", m.Groups[1]);
Console.ReadLine();
}
Run Code Online (Sandbox Code Playgroud)
正如其他人没有提到的HTML tags with attributes,这是我的解决方案:
// <TAG(.*?)>(.*?)</TAG>
// Example
var regex = new System.Text.RegularExpressions.Regex("<h1(.*?)>(.*?)</h1>");
var m = regex.Match("Hello <h1 style='color: red;'>World</h1> !!");
Console.Write(m.Groups[2].Value); // will print -> World
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
58580 次 |
| 最近记录: |