HTML实体解码

chr*_*ris 231 html javascript jquery

如何使用JavaScript或JQuery对HTML实体进行编码和解码?

var varTitle = "Chris' corner";
Run Code Online (Sandbox Code Playgroud)

我希望它是:

var varTitle = "Chris' corner";
Run Code Online (Sandbox Code Playgroud)

Rob*_*t K 250

我建议不要使用被接受作为答案的jQuery代码.虽然它不会将字符串插入到页面中进行解码,但它确实会导致创建脚本和HTML元素等内容.这比我们需要的代码更多.相反,我建议使用更安全,更优化的功能.

var decodeEntities = (function() {
  // this prevents any overhead from creating the object each time
  var element = document.createElement('div');

  function decodeHTMLEntities (str) {
    if(str && typeof str === 'string') {
      // strip script/html tags
      str = str.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
      str = str.replace(/<\/?\w(?:[^"'>]|"[^"]*"|'[^']*')*>/gmi, '');
      element.innerHTML = str;
      str = element.textContent;
      element.textContent = '';
    }

    return str;
  }

  return decodeHTMLEntities;
})();
Run Code Online (Sandbox Code Playgroud)

http://jsfiddle.net/LYteC/4/

要使用此函数,只需调用decodeEntities("&amp;")它将使用与jQuery版本相同的基础技术,但不需要jQuery的开销,并且在清理输入中的HTML标记之后.请参阅Mike Samuel对如何过滤HTML标记的已接受答案的评论.

通过在项目中添加以下行,可以将此函数轻松用作jQuery插件.

jQuery.decodeEntities = decodeEntities;
Run Code Online (Sandbox Code Playgroud)

  • 注意:IE8不支持textContent,因此如果仍然是您的目标浏览器之一,则必须找到另一种解决方案.我只是浪费了一个小时试图解决这个问题,因为我们需要专门解码实体以补偿另一个IE8错误. (8认同)
  • 小心取出HTML标签的行.您不应该使用HTML/XML的正则表达式.Bobince已经清楚了很多年. (3认同)
  • 虽然这很好,但用户应该意识到它非常危险。看起来它似乎正确地去除了“危险的东西”,但它很容易被击败。除非您喜欢受到 xss 攻击,否则请勿在不受信任的用户输入上使用此选项。 (2认同)
  • @Qix我不完全理解这里的问题.HTML/XML肯定不会像人们经常那样"用正则表达式解析".如果您要做的就是将其标记化,那么AFAIK正则表达式就是理想的解决方案.除非我遗漏了某些东西,否则完全剥离标签不应该需要除词法分析之外的任何东西,因此在这里超越正则表达式没有任何好处. (2认同)

Dav*_*mas 205

你可以尝试类似的东西:

var Title = $('<textarea />').html("Chris&apos; corner").text();
console.log(Title);
Run Code Online (Sandbox Code Playgroud)
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Run Code Online (Sandbox Code Playgroud)

JS小提琴.

更具互动性的版本:

$('form').submit(function() {
  var theString = $('#string').val();
  var varTitle = $('<textarea />').html(theString).text();
  $('#output').text(varTitle);
  return false;
});
Run Code Online (Sandbox Code Playgroud)
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<form action="#" method="post">
  <fieldset>
    <label for="string">Enter a html-encoded string to decode</label>
    <input type="text" name="string" id="string" />
  </fieldset>
  <fieldset>
    <input type="submit" value="decode" />
  </fieldset>
</form>

<div id="output"></div>
Run Code Online (Sandbox Code Playgroud)

JS小提琴.

  • 不要将此与不受信任的数据一起使用,请参阅Mike的评论:http://stackoverflow.com/questions/1147359/how-to-decode-html-entities-using-jquery#comment6018122_2419664 (34认同)
  • @chris和@david - 这段代码创建一个空(从DOM分离)div并设置它的innerHTML,最后作为普通文本检索回来.它不是用DIV*围绕它,而是*将它放在div*中.我强调这一点,因为了解jQuery如何工作至关重要. (5认同)
  • 只是chiming in.这很容易受到xss攻击,试试吧!/sf/ask/2189759211/ (4认同)
  • 对于较旧的 jQuery 版本,这可能容易受到 XSS 攻击([在此处查看更多信息](/sf/answers/97716811/))。我建议使用 [he 库](https://github.com/mathiasbynens/he) 代替。您可以在另一个[类似问题的答案](/sf/answers/1651787511/)中查看代码示例。 (2认同)

Ala*_*ett 97

就像Robert K所说,不要使用jQuery.html().text()解码html实体,因为它不安全,因为用户输入永远不能访问DOM.阅读XSS,了解这是不安全的原因.

而是尝试使用escapeunescape方法的Underscore.js实用程序带库:

_.escape(串)

逸出用于插入到HTML字符串,替换&,<,>,",`,和'字符.

_.escape('Curly, Larry & Moe');
=> "Curly, Larry &amp; Moe"
Run Code Online (Sandbox Code Playgroud)

_.unescape(串)

逃跑的相反,替代&amp;,&lt;,&gt;,&quot;,&#96;&#x27;与他们同行的转义.

_.unescape('Curly, Larry &amp; Moe');
=> "Curly, Larry & Moe"
Run Code Online (Sandbox Code Playgroud)

要支持解码更多字符,只需复制Underscore unescape方法并向地图添加更多字符即可.

  • 请记住,它不会对编码的俄语或日语字符进行编码.例如&#x30cf;&#x30ed;&#x30fc;&#x30ef;&#x30fc;&#x30eb;&#x30c9; - >ハローワールド无法用此完成 (4认同)
  • `_.unescape`仅适用于[少数几个值](http://underscorejs.org/docs/underscore.html#section-160).所以像`_.unescape('&raquo;')`这样的东西就会返回`"&raquo;"` (4认同)
  • 我喜欢这个答案,因为它不需要DOM,现在谁可以保证在编写javascript时访问DOM API?不幸的是,它只适用于列出的实体,并留下像&nbsp; 不变. (3认同)
  • @chovy,使用最新的Underscore.js版本> = 1.4.2,你不会得到TypeError. (2认同)

Wil*_*hti 38

这是一个快速方法,不需要创建div,并解码"最常见"的HTML转义字符:

function decodeHTMLEntities(text) {
    var entities = [
        ['amp', '&'],
        ['apos', '\''],
        ['#x27', '\''],
        ['#x2F', '/'],
        ['#39', '\''],
        ['#47', '/'],
        ['lt', '<'],
        ['gt', '>'],
        ['nbsp', ' '],
        ['quot', '"']
    ];

    for (var i = 0, max = entities.length; i < max; ++i) 
        text = text.replace(new RegExp('&'+entities[i][0]+';', 'g'), entities[i][1]);

    return text;
}
Run Code Online (Sandbox Code Playgroud)

  • 对于大多数html实体,你的答案根本不起作用,并且扩展它以包含它们将是非常重复且容易出错的.例如,每个日本汉字字符都有一个实体,其中有数千个.加上这一点,我也不会感到惊讶,如果你的答案是比其他一些人在这里比较慢,因为你会运行数千取代了数以千计的正则表达式的每个字符串进行解码. (14认同)
  • 当您对这些字符串进行编码时,这实际上取决于您的目的。如果您的目标是不通过诸如 &lt; 或 &gt; 之类的东西触发 HTML 处理,则完全没有必要通过字符实体语法对其他字符进行编码。大量的字符实体主要用作便利工具。我列出的实体是您必须转义的最低限度的实体,以避免数据与 HTML 混淆。[在下一条评论中继续] (2认同)

ins*_*ign 19

这是我最喜欢的HTML字符解码方式.使用此代码的优点是标记也被保留.

function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}
Run Code Online (Sandbox Code Playgroud)

示例:http://jsfiddle.net/k65s3/

输入:

Entity:&nbsp;Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Run Code Online (Sandbox Code Playgroud)

输出:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
Run Code Online (Sandbox Code Playgroud)

  • 下次@insign时,请注明原始作者或提供链接。/sf/answers/517635121/ (3认同)
  • 即使jquery不可用或尚未加载,此方法也可以在任何地方使用,因为它是纯JavaScript。 (2认同)
  • 这种技术有什么缺点吗?这似乎比上面的答案容易得多。 (2认同)
  • @geauser 是的,完成 (2认同)

小智 18

受Robert K的解决方案的启发,此版本不会删除HTML标记,并且同样安全.

var decode_entities = (function() {
    // Remove HTML Entities
    var element = document.createElement('div');

    function decode_HTML_entities (str) {

        if(str && typeof str === 'string') {

            // Escape HTML before decoding for HTML Entities
            str = escape(str).replace(/%26/g,'&').replace(/%23/g,'#').replace(/%3B/g,';');

            element.innerHTML = str;
            if(element.innerText){
                str = element.innerText;
                element.innerText = '';
            }else{
                // Firefox support
                str = element.textContent;
                element.textContent = '';
            }
        }
        return unescape(str);
    }
    return decode_HTML_entities;
})();
Run Code Online (Sandbox Code Playgroud)

  • 那些`escape()`和`unescape()`函数已被弃用。https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Global_Objects/escape (2认同)

Mir*_*dil 16

这是另一个版本:

function convertHTMLEntity(text){
    const span = document.createElement('span');

    return text
    .replace(/&[#A-Za-z0-9]+;/gi, (entity,position,text)=> {
        span.innerHTML = entity;
        return span.innerText;
    });
}

console.log(convertHTMLEntity('Large &lt; &#163; 500'));
Run Code Online (Sandbox Code Playgroud)


Jas*_*ams 12

jQuery提供了一种编码和解码html实体的方法.

如果您使用"<div />"标记,它将删除所有html.

function htmlDecode(value) {
    return $("<div/>").html(value).text();
}

function htmlEncode(value) {
    return $('<div/>').text(value).html();
}
Run Code Online (Sandbox Code Playgroud)

如果使用"<textarea />"标记,它将保留html标记.

function htmlDecode(value) {
    return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
    return $('<textarea/>').text(value).html();
}
Run Code Online (Sandbox Code Playgroud)


Tyl*_*son 10

为了将另一个"灵感来自Robert K"添加到列表中,这是另一个不剥离HTML标记的安全版本.它不是通过HTML解析器运行整个字符串,而是仅提取实体并转换它们.

var decodeEntities = (function() {
    // this prevents any overhead from creating the object each time
    var element = document.createElement('div');

    // regular expression matching HTML entities
    var entity = /&(?:#x[a-f0-9]+|#[0-9]+|[a-z0-9]+);?/ig;

    return function decodeHTMLEntities(str) {
        // find and replace all the html entities
        str = str.replace(entity, function(m) {
            element.innerHTML = m;
            return element.textContent;
        });

        // reset the value
        element.textContent = '';

        return str;
    }
})();
Run Code Online (Sandbox Code Playgroud)


Dio*_*oss 9

将不受信任的HTML注入页面是危险的,如如何使用jQuery解码HTML实体中所述?.

另一种方法是使用PHP的html_entity_decode的JavaScript专用实现(来自http://phpjs.org/functions/html_entity_decode:424).这个例子就像这样:

var varTitle = html_entity_decode("Chris&apos; corner");
Run Code Online (Sandbox Code Playgroud)

  • 实际上,当前版本的 html_entity_decode 不处理 '。 (2认同)

Vyv*_*vIT 9

灵感来自Robert K的解决方案,剥离html标签并阻止执行脚本和事件处理程序,如:<img src=fake onerror="prompt(1)"> 在最新的Chrome,FF,IE上测试(应该在IE9上运行,但尚未测试).

var decodeEntities = (function () {
        //create a new html document (doesn't execute script tags in child elements)
        var doc = document.implementation.createHTMLDocument("");
        var element = doc.createElement('div');

        function getText(str) {
            element.innerHTML = str;
            str = element.textContent;
            element.textContent = '';
            return str;
        }

        function decodeHTMLEntities(str) {
            if (str && typeof str === 'string') {
                var x = getText(str);
                while (str !== x) {
                    str = x;
                    x = getText(x);
                }
                return x;
            }
        }
        return decodeHTMLEntities;
    })();
Run Code Online (Sandbox Code Playgroud)

只需致电:

decodeEntities('<img src=fake onerror="prompt(1)">');
decodeEntities("<script>alert('aaa!')</script>");
Run Code Online (Sandbox Code Playgroud)


Soy*_*oes 9

这是完整版

function htmldecode(s){
    window.HTML_ESC_MAP = {
    "nbsp":" ","iexcl":"¡","cent":"¢","pound":"£","curren":"¤","yen":"¥","brvbar":"¦","sect":"§","uml":"¨","copy":"©","ordf":"ª","laquo":"«","not":"¬","reg":"®","macr":"¯","deg":"°","plusmn":"±","sup2":"²","sup3":"³","acute":"´","micro":"µ","para":"¶","middot":"·","cedil":"¸","sup1":"¹","ordm":"º","raquo":"»","frac14":"¼","frac12":"½","frac34":"¾","iquest":"¿","Agrave":"À","Aacute":"Á","Acirc":"Â","Atilde":"Ã","Auml":"Ä","Aring":"Å","AElig":"Æ","Ccedil":"Ç","Egrave":"È","Eacute":"É","Ecirc":"Ê","Euml":"Ë","Igrave":"Ì","Iacute":"Í","Icirc":"Î","Iuml":"Ï","ETH":"Ð","Ntilde":"Ñ","Ograve":"Ò","Oacute":"Ó","Ocirc":"Ô","Otilde":"Õ","Ouml":"Ö","times":"×","Oslash":"Ø","Ugrave":"Ù","Uacute":"Ú","Ucirc":"Û","Uuml":"Ü","Yacute":"Ý","THORN":"Þ","szlig":"ß","agrave":"à","aacute":"á","acirc":"â","atilde":"ã","auml":"ä","aring":"å","aelig":"æ","ccedil":"ç","egrave":"è","eacute":"é","ecirc":"ê","euml":"ë","igrave":"ì","iacute":"í","icirc":"î","iuml":"ï","eth":"ð","ntilde":"ñ","ograve":"ò","oacute":"ó","ocirc":"ô","otilde":"õ","ouml":"ö","divide":"÷","oslash":"ø","ugrave":"ù","uacute":"ú","ucirc":"û","uuml":"ü","yacute":"ý","thorn":"þ","yuml":"ÿ","fnof":"ƒ","Alpha":"?","Beta":"?","Gamma":"?","Delta":"?","Epsilon":"?","Zeta":"?","Eta":"?","Theta":"?","Iota":"?","Kappa":"?","Lambda":"?","Mu":"?","Nu":"?","Xi":"?","Omicron":"?","Pi":"?","Rho":"?","Sigma":"?","Tau":"?","Upsilon":"?","Phi":"?","Chi":"?","Psi":"?","Omega":"?","alpha":"?","beta":"?","gamma":"?","delta":"?","epsilon":"?","zeta":"?","eta":"?","theta":"?","iota":"?","kappa":"?","lambda":"?","mu":"?","nu":"?","xi":"?","omicron":"?","pi":"?","rho":"?","sigmaf":"?","sigma":"?","tau":"?","upsilon":"?","phi":"?","chi":"?","psi":"?","omega":"?","thetasym":"?","upsih":"?","piv":"?","bull":"•","hellip":"…","prime":"?","Prime":"?","oline":"?","frasl":"?","weierp":"?","image":"?","real":"?","trade":"™","alefsym":"?","larr":"?","uarr":"?","rarr":"?","darr":"?","harr":"?","crarr":"?","lArr":"?","uArr":"?","rArr":"?","dArr":"?","hArr":"?","forall":"?","part":"?","exist":"?","empty":"?","nabla":"?","isin":"?","notin":"?","ni":"?","prod":"?","sum":"?","minus":"?","lowast":"?","radic":"?","prop":"?","infin":"?","ang":"?","and":"?","or":"?","cap":"?","cup":"?","int":"?","there4":"?","sim":"?","cong":"?","asymp":"?","ne":"?","equiv":"?","le":"?","ge":"?","sub":"?","sup":"?","nsub":"?","sube":"?","supe":"?","oplus":"?","otimes":"?","perp":"?","sdot":"?","lceil":"?","rceil":"?","lfloor":"?","rfloor":"?","lang":"?","rang":"?","loz":"?","spades":"?","clubs":"?","hearts":"?","diams":"?","\"":"quot","amp":"&","lt":"<","gt":">","OElig":"Œ","oelig":"œ","Scaron":"Š","scaron":"š","Yuml":"Ÿ","circ":"ˆ","tilde":"˜","ndash":"–","mdash":"—","lsquo":"‘","rsquo":"’","sbquo":"‚","ldquo":"“","rdquo":"”","bdquo":"„","dagger":"†","Dagger":"‡","permil":"‰","lsaquo":"‹","rsaquo":"›","euro":"€"};
    if(!window.HTML_ESC_MAP_EXP)
        window.HTML_ESC_MAP_EXP = new RegExp("&("+Object.keys(HTML_ESC_MAP).join("|")+");","g");
    return s?s.replace(window.HTML_ESC_MAP_EXP,function(x){
        return HTML_ESC_MAP[x.substring(1,x.length-1)]||x;
    }):s;
}
Run Code Online (Sandbox Code Playgroud)

用法

htmldecode("&sum;&nbsp;&gt;&euro;");
Run Code Online (Sandbox Code Playgroud)


ome*_*rts 9

对@William Lahti 的回答更实用的方法:

var entities = {
  'amp': '&',
  'apos': '\'',
  '#x27': '\'',
  '#x2F': '/',
  '#39': '\'',
  '#47': '/',
  'lt': '<',
  'gt': '>',
  'nbsp': ' ',
  'quot': '"'
}

function decodeHTMLEntities (text) {
  return text.replace(/&([^;]+);/gm, function (match, entity) {
    return entities[entity] || match
  })
}
Run Code Online (Sandbox Code Playgroud)

  • 这并没有解决 decodeHTMLEntities('ä') 或 ä 的问题:) (2认同)
  • 顺便说一句,这正是像我这样的人所需要的。我需要一个简短的列表,我可以在文档不可用的情况下将其放入 gatsby 实用程序中。完全防弹并不总是目标。 (2认同)