从xhtml文档中删除未关闭的打开<p>标记

Question

从xhtml文档中删除未关闭的打开<p>标记

我有一个包含大量标签的大型xhtml文档.我观察到一些未闭合的开头段落标签不必要地重复,我想删除它们或用空格替换它们.我只想编码识别未封闭的段落标签并删除它们.

这是一个小样本,以显示我的意思:

<p><strong>Company Registration No.1</strong> </p>
<p><strong>Company Registration No.2</strong></p>

<p>      <!-- extra tag -->
<p>      <!-- extra tag -->

<hr/>     

<p><strong> HALL WOOD (LEEDS) LIMITED</strong><br/></p>
<p><strong>REPORT AND FINANCIAL STATEMENTS </strong></p>

Run Code Online (Sandbox Code Playgroud)

有人可以给我控制台应用程序的代码,只是为了删除这些未封闭的段落标签.

Answer 1

Ric*_*III 3

这应该有效：

public static class XHTMLCleanerUpperThingy
{
    private const string p = "<p>";
    private const string closingp = "</p>";

    public static string CleanUpXHTML(string xhtml)
    {
        StringBuilder builder = new StringBuilder(xhtml);
        for (int idx = 0; idx < xhtml.Length; idx++)
        {
            int current;
            if ((current = xhtml.IndexOf(p, idx)) != -1)
            {
                int idxofnext = xhtml.IndexOf(p, current + p.Length);
                int idxofclose = xhtml.IndexOf(closingp, current);

                // if there is a next <p> tag
                if (idxofnext > 0)
                {
                    // if the next closing tag is farther than the next <p> tag
                    if (idxofnext < idxofclose)
                    {
                        for (int j = 0; j < p.Length; j++)
                        {
                            builder[current + j] = ' ';
                        }
                    }
                }
                // if there is not a final closing tag
                else if (idxofclose < 0)
                {
                    for (int j = 0; j < p.Length; j++)
                    {
                        builder[current + j] = ' ';
                    }
                }
            }
        }

        return builder.ToString();
    }
}

Run Code Online (Sandbox Code Playgroud)

我已经用您的示例示例对其进行了测试，并且它有效...虽然对于算法来说这是一个糟糕的公式，但它应该为您提供一个起始基础！

归档时间：	15 年，6 月前
查看次数：	1358 次
最近记录：	15 年，6 月前