Lea*_*cia 6 html php strip-tags
根据strip_tags文档,第二个参数采用允许的标记.但在我的情况下,我想反过来.假设我接受script_tags正常(默认)接受的<script>标签,但只剥离标签.有什么办法吗?
我并不是指有人为我编码,而是非常感谢如何实现这一点(如果可能的话)的可能方式的输入.
编辑
要使用HTML Purifier HTML.ForbiddenElements配置指令,您似乎会执行以下操作:
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.ForbiddenElements', array('script','style','applet'));
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
Run Code Online (Sandbox Code Playgroud)
HTML.ForbiddenElements 应该设置为array.我不知道的是array会员应采取的形式:
array('script','style','applet')
Run Code Online (Sandbox Code Playgroud)
要么:
array('<script>','<style>','<applet>')
Run Code Online (Sandbox Code Playgroud)
或者是其他东西?
我认为这是第一种形式,没有分隔符; HTML.AllowedElements使用一种与TinyMCE valid elements语法有些共同的配置字符串形式:
tinyMCE.init({
...
valid_elements : "a[href|target=_blank],strong/b,div[align],br",
...
});
Run Code Online (Sandbox Code Playgroud)
所以我猜这只是一个术语,不应该提供属性(因为你禁止元素......虽然也有一个HTML.ForbiddenAttributes).但这是猜测.
我还将从HTML.ForbiddenAttributes文档中添加此注释:
警告:因此
%HTML.ForbiddenElements,该指令补充,检查该指令,以讨论在使用此指令之前应该三思而后的原因.
黑名单不像白名单那样"强大",但您可能有自己的理由.请注意并小心.
没有测试,我不知道该告诉你什么.我会继续寻找答案,但我可能会先上床睡觉.现在已经很晚了.:)
虽然我认为你真的应该使用HTML Purifier并使用它的HTML.ForbiddenElements配置指令,但我认为一个合理的选择,如果你真的,真的想要使用的strip_tags()是从黑名单派生白名单.换句话说,删除你不想要的东西,然后使用剩下的东西.
例如:
function blacklistElements($blacklisted = '', &$errors = array()) {
if ((string)$blacklisted == '') {
$errors[] = 'Empty string.';
return array();
}
$html5 = array(
"<menu>","<command>","<summary>","<details>","<meter>","<progress>",
"<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>",
"<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>",
"<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>",
"<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>",
"<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>",
"<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>",
"<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>",
"<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>",
"<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>",
"<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>",
"<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>",
"<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>",
"<title>","<head>","<html>"
);
$list = trim(strtolower($blacklisted));
$list = preg_replace('/[^a-z ]/i', '', $list);
$list = '<' . str_replace(' ', '> <', $list) . '>';
$list = array_map('trim', explode(' ', $list));
return array_diff($html5, $list);
}
Run Code Online (Sandbox Code Playgroud)
然后运行它:
$blacklisted = '<html> <bogus> <EM> em li ol';
$whitelist = blacklistElements($blacklisted);
if (count($errors)) {
echo "There were errors.\n";
print_r($errors);
echo "\n";
} else {
// Do strip_tags() ...
}
Run Code Online (Sandbox Code Playgroud)
因此,如果您传入了您不想允许的内容,它会以一种array形式返回HTML5元素列表,然后您可以strip_tags()在将其加入字符串后将其输入:
$stripped = strip_tags($html, implode('', $whitelist)));
Run Code Online (Sandbox Code Playgroud)
买者自负
现在,我已经将这种情况整合在一起了,我知道还有一些我还没想过的问题.例如,从strip_tags()手册页的$allowable_tags参数:
注意:
此参数不应包含空格.
strip_tags()将标记视为<第一个空格或之间的不区分大小写的字符串>.这意味着strip_tags("<br/>", "<br>")返回一个空字符串.
已经很晚了,出于某种原因,我无法弄清楚这对这种方法意味着什么.所以我明天就要考虑一下.我还在$html5此MDN文档页面的函数元素中编译了HTML元素列表.眼尖的读者可能会注意到所有标签都是这种形式:
<tagName>
Run Code Online (Sandbox Code Playgroud)
我不确定这将如何影响结果,我是否需要考虑使用短<tagName/>标签的变化以及一些,咳嗽,奇怪的变化.当然,还有更多的标签.
所以它可能不是生产准备好了.但是你明白了.