Ian*_*ber 11 html php regex string
我正在尝试编写一个正则表达式,它将删除除SRC属性之外的所有标记属性.例如:
<p id="paragraph" class="green">This is a paragraph with an image <img src="/path/to/image.jpg" width="50" height="75"/></p>
Run Code Online (Sandbox Code Playgroud)
将返回为:
<p>This is a paragraph with an image <img src="/path/to/image.jpg" /></p>
Run Code Online (Sandbox Code Playgroud)
我有一个正则表达式来删除所有属性,但我正在尝试调整它以留在src中.这是我到目前为止所拥有的:
<?php preg_replace('/<([A-Z][A-Z0-9]*)(\b[^>]*)>/i', '<$1>', '<html><goes><here>');
Run Code Online (Sandbox Code Playgroud)
使用PHP的preg_replace()为此.
谢谢!伊恩
gna*_*arf 18
这可能适合您的需求:
$text = '<p id="paragraph" class="green">This is a paragraph with an image <img src="/path/to/image.jpg" width="50" height="75"/></p>';
echo preg_replace("/<([a-z][a-z0-9]*)(?:[^>]*(\ssrc=['\"][^'\"]*['\"]))?[^>]*?(\/?)>/i",'<$1$2$3>', $text);
// <p>This is a paragraph with an image <img src="/path/to/image.jpg"/></p>
Run Code Online (Sandbox Code Playgroud)
RegExp细分:
/ # Start Pattern
< # Match '<' at beginning of tags
( # Start Capture Group $1 - Tag Name
[a-z] # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
) # End Capture Group
(?: # Start Non-Capture Group
[^>]* # Match anything other than '>', Zero or More Times
( # Start Capture Group $2 - ' src="...."'
\s # Match one whitespace
src= # Match 'src='
['"] # Match ' or "
[^'"]* # Match anything other than ' or "
['"] # Match ' or "
) # End Capture Group 2
)? # End Non-Capture Group, match group zero or one time
[^>]*? # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(\/?) # Capture Group $3 - '/' if it is there
> # Match '>'
/i # End Pattern - Case Insensitive
Run Code Online (Sandbox Code Playgroud)
添加一些引用,并使用替换文本,<$1$2$3>它应该src=从格式良好的HTML标记中删除任何非属性.
请注意这并不一定适用于所有输入,因为Anti-HTML + RegExp人员如此巧妙地注意到以下内容.有一些后备,最明显的是<p style=">">最终会<p>">和其他一些破坏的问题...我建议在Zend_Filter_StripTags中查看PHP中的完整证明标记/属性过滤器
相反,你应该打电话DOMDocument::loadHTML.
然后,您可以通过文档中的元素进行递归并调用removeAttribute.
好吧,这是我使用的似乎效果很好的方法:
<([A-Z][A-Z0-9]*)(\b[^>src]*)(src\=[\'|"|\s]?[^\'][^"][^\s]*[\'|"|\s]?)?(\b[^>]*)>
Run Code Online (Sandbox Code Playgroud)
随意戳它的任何洞。
| 归档时间: |
|
| 查看次数: |
10493 次 |
| 最近记录: |