39 php sanitization
有一百万个Q&A解释了类似的选项FILTER_FLAG_STRIP_LOW,但是FILTER_SANITIZE_STRING它本身做了什么,没有任何选择?它只是过滤标签吗?
rr-*_*rr- 64
根据PHP手册:
剥离标签,可选择剥离或编码特殊字符.
The FILTER_SANITIZE_STRING过滤条带或编码不需要的字符.此过滤器会删除可能对您的应用程序有害的数据.它用于剥离标签并删除或编码不需要的字符.
现在,这并没有告诉我们多少.我们来看看一些PHP来源.
ext/filter/filter.c:
static const filter_list_entry filter_list[] = {
/*...*/
{ "string", FILTER_SANITIZE_STRING, php_filter_string },
{ "stripped", FILTER_SANITIZE_STRING, php_filter_string },
{ "encoded", FILTER_SANITIZE_ENCODED, php_filter_encoded },
/*...*/
Run Code Online (Sandbox Code Playgroud)
现在,我们来看看如何php_filter_string定义.
ext/filter/sanitizing_filters.c:
/* {{{ php_filter_string */
void php_filter_string(PHP_INPUT_FILTER_PARAM_DECL)
{
size_t new_len;
unsigned char enc[256] = {0};
/* strip high/strip low ( see flags )*/
php_filter_strip(value, flags);
if (!(flags & FILTER_FLAG_NO_ENCODE_QUOTES)) {
enc['\''] = enc['"'] = 1;
}
if (flags & FILTER_FLAG_ENCODE_AMP) {
enc['&'] = 1;
}
if (flags & FILTER_FLAG_ENCODE_LOW) {
memset(enc, 1, 32);
}
if (flags & FILTER_FLAG_ENCODE_HIGH) {
memset(enc + 127, 1, sizeof(enc) - 127);
}
php_filter_encode_html(value, enc);
/* strip tags, implicitly also removes \0 chars */
new_len = php_strip_tags_ex(Z_STRVAL_P(value), Z_STRLEN_P(value), NULL, NULL, 0, 1);
Z_STRLEN_P(value) = new_len;
if (new_len == 0) {
zval_dtor(value);
if (flags & FILTER_FLAG_EMPTY_STRING_NULL) {
ZVAL_NULL(value);
} else {
ZVAL_EMPTY_STRING(value);
}
return;
}
}
Run Code Online (Sandbox Code Playgroud)
我会跳过评论标志,因为它们已经在互联网上解释过,就像你说的那样,并专注于总是执行的内容,而这些内容并没有得到很好的记录.
第一 - php_filter_strip.它没有做太多,只需要传递给函数的标志并相应地处理它们.它记录了很多文件.
然后我们构建某种地图和调用php_filter_encode_html.它更有趣:它这样的东西转换",',&和字符与它们的ASCII码低于32和大于127为HTML实体,所以&在你的字符串变成&.同样,它使用标志.
然后我们调用php_strip_tags_ex,它只/ext/standard/string.c删除HTML,XML和PHP标记(根据其定义)并删除NULL字节,如注释所示.
其后面的代码用于内部字符串管理,并不真正进行任何清理.好吧,不完全一样 - 如果已清理的字符串为空,则传递未记录的标志FILTER_FLAG_EMPTY_STRING_NULL将返回NULL,而不是只返回一个空字符串,但它并没有那么有用.一个例子:
var_dump(filter_var("yo", FILTER_SANITIZE_STRING, FILTER_FLAG_EMPTY_STRING_NULL));
var_dump(filter_var("\0", FILTER_SANITIZE_STRING, FILTER_FLAG_EMPTY_STRING_NULL));
var_dump(filter_var("yo", FILTER_SANITIZE_STRING));
var_dump(filter_var("\0", FILTER_SANITIZE_STRING));
Run Code Online (Sandbox Code Playgroud)
→
string(2) "yo"
NULL
string(2) "yo"
string(0) ""
Run Code Online (Sandbox Code Playgroud)
没有更多的事情发生,所以手册是相当正确的 - 总结一下:
FILTER_FLAG_NO_ENCODE_QUOTES - 此标志不编码引号.FILTER_FLAG_STRIP_LOW - 剥离ASCII值低于32的字符.FILTER_FLAG_STRIP_HIGH - 剥离ASCII值大于127的字符.FILTER_FLAG_ENCODE_LOW - 编码ASCII值低于32的字符.FILTER_FLAG_ENCODE_HIGH - 对ASCII值大于127的字符进行编码.FILTER_FLAG_ENCODE_AMP- 将&字符编码为&(不&).FILTER_FLAG_EMPTY_STRING_NULL- 返回NULL而不是空字符串.我不确定"剥离标签"是否仅表示< >字符,如果它保留标签之间的内容,例如字符串"Hello!" 从<b>Hello!</b>,所以我决定检查.以下是使用PHP 7.1.5(以及命令行的Bash)的结果:
curl --data-urlencode 'my-input='\ '1. ASCII b/n 32 and 127: ABC abc 012 '\ '2. ASCII higher than 127: Çüé '\ '3. PHP tag: <?php $i = 0; ?> '\ '4. HTML tag: <script type="text/javascript">var i = 0;</script> '\ '5. Ampersand: & '\ '6. Backtick: ` '\ '7. Double quote: " '\ '8. Single quote: '"'" \ http://localhost/sanitize.php
<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING);1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: Çüé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_NO_ENCODE_QUOTES);1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: Çüé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH);1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_BACKTICK);1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: Çüé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: 7. Double quote: " 8. Single quote: '<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_HIGH);1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: Çüé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '<?php echo filter_input(INPUT_POST,'my-input', FILTER_SANITIZE_STRING, FILTER_FLAG_ENCODE_AMP);1. ASCII b/n 32 and 127: ABC abc 012 2. ASCII higher than 127: Çüé 3. PHP tag: 4. HTML tag: var i = 0; 5. Ampersand: & 6. Backtick: ` 7. Double quote: " 8. Single quote: '此外,对于标志FILTER_FLAG_STRIP_LOW&FILTER_FLAG_ENCODE_LOW,由于我的Bash不显示这些字符,我使用铃声字符(?,ASCII 007)和Restman Chrome扩展程序检查: