strip_tags禁止使用某些标签

Lea*_*cia 6 html php strip-tags

根据strip_tags文档,第二个参数采用允许的标记.但在我的情况下,我想反过来.假设我接受script_tags正常(默认)接受的<script>标签,但只剥离标签.有什么办法吗?

我并不是指有人为我编码,而是非常感谢如何实现这一点(如果可能的话)的可能方式的输入.

Jar*_*ish 5

编辑

要使用HTML Purifier HTML.ForbiddenElements配置指令,您似乎会执行以下操作:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.ForbiddenElements', array('script','style','applet'));
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
Run Code Online (Sandbox Code Playgroud)

http://htmlpurifier.org/docs

HTML.ForbiddenElements 应该设置为array.我不知道的是array会员应采取的形式:

array('script','style','applet')
Run Code Online (Sandbox Code Playgroud)

要么:

array('<script>','<style>','<applet>')
Run Code Online (Sandbox Code Playgroud)

或者是其他东西?

认为这是第一种形式,没有分隔符; HTML.AllowedElements使用一种与TinyMCE valid elements语法有些共同的配置字符串形式:

tinyMCE.init({
    ...
    valid_elements : "a[href|target=_blank],strong/b,div[align],br",
    ...
});
Run Code Online (Sandbox Code Playgroud)

所以我猜这只是一个术语,不应该提供属性(因为你禁止元素......虽然也有一个HTML.ForbiddenAttributes).但这是猜测.

我还将从HTML.ForbiddenAttributes文档中添加此注释:

警告:因此%HTML.ForbiddenElements,该指令补充,检查该指令,以讨论在使用此指令之前应该三思而后的原因.

黑名单不像白名单那样"强大",但您可能有自己的理由.请注意并小心.

没有测试,我不知道该告诉你什么.我会继续寻找答案,但我可能会先上床睡觉.现在已经很晚了.:)


虽然我认为你真的应该使用HTML Purifier并使用它的HTML.ForbiddenElements配置指令,但我认为一个合理的选择,如果你真的,真的想要使用的strip_tags()是从黑名单派生白名单.换句话说,删除你不想要的东西,然后使用剩下的东西.

例如:

function blacklistElements($blacklisted = '', &$errors = array()) {
    if ((string)$blacklisted == '') {
        $errors[] = 'Empty string.';
        return array();
    }

    $html5 = array(
        "<menu>","<command>","<summary>","<details>","<meter>","<progress>",
        "<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>",
        "<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>",
        "<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>",
        "<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>",
        "<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>",
        "<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>",
        "<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>",
        "<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>",
        "<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>",
        "<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>",
        "<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>",
        "<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>",
        "<title>","<head>","<html>"
    );

    $list = trim(strtolower($blacklisted));
    $list = preg_replace('/[^a-z ]/i', '', $list);
    $list = '<' . str_replace(' ', '> <', $list) . '>';
    $list = array_map('trim', explode(' ', $list));

    return array_diff($html5, $list);
}
Run Code Online (Sandbox Code Playgroud)

然后运行它:

$blacklisted = '<html> <bogus> <EM> em li ol';
$whitelist = blacklistElements($blacklisted);

if (count($errors)) {
    echo "There were errors.\n";
    print_r($errors);
    echo "\n";
} else {
    // Do strip_tags() ...
}
Run Code Online (Sandbox Code Playgroud)

http://codepad.org/LV8ckRjd

因此,如果您传入了您不想允许的内容,它会以一种array形式返回HTML5元素列表,然后您可以strip_tags()在将其加入字符串后将其输入:

$stripped = strip_tags($html, implode('', $whitelist)));
Run Code Online (Sandbox Code Playgroud)

买者自负

现在,我已经将这种情况整合在一起了,我知道还有一些我还没想过的问题.例如,从strip_tags()手册页$allowable_tags参数:

注意:

此参数不应包含空格.strip_tags()将标记视为<第一个空格或之间的不区分大小写的字符串>.这意味着strip_tags("<br/>", "<br>")返回一个空字符串.

已经很晚了,出于某种原因,我无法弄清楚这对这种方法意味着什么.所以我明天就要考虑一下.我还在$html5MDN文档页面的函数元素中编译了HTML元素列表.眼尖的读者可能会注意到所有标签都是这种形式:

<tagName>
Run Code Online (Sandbox Code Playgroud)

我不确定这将如何影响结果,我是否需要考虑使用<tagName/>标签的变化以及一些,咳嗽,奇怪的变化.当然,还有更多的标签.

所以它可能不是生产准备好了.但是你明白了.