如何使用段落标记包围所有文本片段?

Pet*_*und 6 php regex preg-replace paragraph

我想在任何文本项周围放置段落标记.因此,它应该避免使用表格和其他元素.我怎么做?我想它可以用preg_replace以某种方式制作?

小智 17

这里有一些功能可以帮助您做您想做的事情:

// nl2p
// This function will convert newlines to HTML paragraphs
// without paying attention to HTML tags. Feed it a raw string and it will
// simply return that string sectioned into HTML paragraphs

function nl2p($str) {
    $arr=explode("\n",$str);
    $out='';

    for($i=0;$i<count($arr);$i++) {
        if(strlen(trim($arr[$i]))>0)
            $out.='<p>'.trim($arr[$i]).'</p>';
    }
    return $out;
}



// nl2p_html
// This function will add paragraph tags around textual content of an HTML file, leaving
// the HTML itself intact
// This function assumes that the HTML syntax is correct and that the '<' and '>' characters
// are not used in any of the values for any tag attributes. If these assumptions are not met,
// mass paragraph chaos may ensue. Be safe.

function nl2p_html($str) {

    // If we find the end of an HTML header, assume that this is part of a standard HTML file. Cut off everything including the
    // end of the head and save it in our output string, then trim the head off of the input. This is mostly because we don't
    // want to surrount anything like the HTML title tag or any style or script code in paragraph tags. 
    if(strpos($str,'</head>')!==false) {
        $out=substr($str,0,strpos($str,'</head>')+7);
        $str=substr($str,strpos($str,'</head>')+7);
    }

    // First, we explode the input string based on wherever we find HTML tags, which start with '<'
    $arr=explode('<',$str);

    // Next, we loop through the array that is broken into HTML tags and look for textual content, or
    // anything after the >
    for($i=0;$i<count($arr);$i++) {
        if(strlen(trim($arr[$i]))>0) {
            // Add the '<' back on since it became collateral damage in our explosion as well as the rest of the tag
            $html='<'.substr($arr[$i],0,strpos($arr[$i],'>')+1);

            // Take the portion of the string after the end of the tag and explode that by newline. Since this is after
            // the end of the HTML tag, this must be textual content.
            $sub_arr=explode("\n",substr($arr[$i],strpos($arr[$i],'>')+1));

            // Initialize the output string for this next loop
            $paragraph_text='';

            // Loop through this new array and add paragraph tags (<p>...</p>) around any element that isn't empty
            for($j=0;$j<count($sub_arr);$j++) {
                if(strlen(trim($sub_arr[$j]))>0)
                    $paragraph_text.='<p>'.trim($sub_arr[$j]).'</p>';
            }

            // Put the text back onto the end of the HTML tag and put it in our output string
            $out.=$html.$paragraph_text;
        }

    }

    // Throw it back into our program
    return $out;
}
Run Code Online (Sandbox Code Playgroud)

第一个是nl2p(),它将一个字符串作为输入,并在有newline("\n")字符的地方将其转换为数组.然后它遍历每个元素,如果它找到一个非空的元素,它将<p></p>在它周围包装标签并将其添加到一个字符串,该字符串在函数的末尾返回.

第二个是nl2p_html(),是前者的一个更复杂的版本.将HTML文件的内容作为字符串传递给它,它将包装<p></p>标记任何非HTML文本.它通过将字符串爆炸成数组来实现<这一点,其中分隔符是字符,它是任何HTML标记的开头.然后,遍历这些元素中的每一个,代码将查找HTML标记的结尾,并将其后面的任何内容转换为新的字符串.这个新字符串本身将被分解为一个数组,其中分隔符是换行符("\n").循环遍历这个新数组,代码查找非空元素.当它找到一些数据时,它会将其包装在段落标记中并将其添加到输出字符串中.当这个循环结束时,这个字符串将被添加回HTML代码,这将一起修改为输出缓冲区字符串,一旦函数完成就返回.

tl; dr:nl2p()将字符串转换为HTML段落而不留任何空段落,nl2p_html()将段落标记包裹在HTML文档正文的内容周围.

我在几个小的示例HTML文件上测试了这个,以确保间距和其他东西不会破坏输出.由nl2p_html()生成的代码也可能不符合W3C,因为它将包围段落之类的锚点而不是相反的方式.

希望这可以帮助.