是否有JDK类来进行HTML编码(但不是URL编码)?

Edd*_*die 31 html java html-encode

我当然熟悉这些java.net.URLEncoderjava.net.URLDecoder课程.但是,我只需要HTML风格的编码.(我不想' '替换'+'等).我不知道在类中构建的JDK只会进行HTML编码.有吗?我知道其他选择(例如,Jakarta Commons Lang'BtringEscapeUtils',但我不想在我需要它的项目中添加另一个外部依赖项.

我希望在最近的JDK(又名5或6)中添加了一些我不知道的JDK.否则我必须自己动手.

joh*_*ase 45

在类中没有内置JDK来实现这一点,但它是Jakarta commons-lang库的一部分.

String escaped = StringEscapeUtils.escapeHtml3(stringToEscape);
String escaped = StringEscapeUtils.escapeHtml4(stringToEscape);
Run Code Online (Sandbox Code Playgroud)

查看JavaDoc

添加依赖项通常就像将jar放到某个地方一样简单,而commons-lang有很多有用的实用程序,通常值得将它放在板上.

  • 正如我在对另一个答案的评论中所说的那样,添加一个依赖项是*NOT*就像在某处删除一个JAR一样简单.律师需要查看第三方JAR的许可证,安装人员需要更改,等等.这并不总是微不足道的. (8认同)
  • 我也不喜欢对单个方法进行依赖的概念。 (2认同)
  • 请注意,上面的方法签名是错误的.HTML应该有一个小写的tml`String escaped = StringEscapeUtils.escapeHtml(stringToEscape);` (2认同)
  • 3.6 中已弃用。请改用 org.apache.commons.text.StringEscapeUtils。 (2认同)

小智 14

一个简单的方法似乎就是这个:

public static String encodeHTML(String s)
{
    StringBuffer out = new StringBuffer();
    for(int i=0; i<s.length(); i++)
    {
        char c = s.charAt(i);
        if(c > 127 || c=='"' || c=='<' || c=='>')
        {
           out.append("&#"+(int)c+";");
        }
        else
        {
            out.append(c);
        }
    }
    return out.toString();
}
Run Code Online (Sandbox Code Playgroud)

资料来源:http://forums.thedailywtf.com/forums/p/2806/72054.aspx#72054


Edd*_*die 9

显然,答案是"不".遗憾的是,在短期内我不得不做一些事情并且不能为它添加新的外部依赖.我同意每个人使用Commons Lang是最好的长期解决方案.一旦我可以向项目中添加新库,我就会这样做.

令人遗憾的是Java API中没有这种常见用途.


pet*_*erh 5

我发现我审查过的所有现有解决方案(库)都遇到了以下一个或几个问题:

  • 他们没有在 Javadoc 中确切地告诉您他们替换了什么。
  • 他们逃避太多......这使得HTML更难以阅读。
  • 它们不会记录返回值何时可以安全使用(可安全用于 HTML 实体?,用于 HTML 属性?等)
  • 它们没有针对速度进行优化。
  • 它们没有避免双重转义的功能(不要转义已经转义的内容)
  • 他们用&apos; (错误!)

最重要的是,我还遇到了无法引入外部图书馆的问题,至少在没有一定数量的繁文缛节的情况下是这样。

所以,我推出了自己的。有罪。

下面是它的样子,但最新版本总是可以在这个要点中找到。

/**
 * HTML string utilities
 */
public class SafeHtml {

    /**
     * Escapes a string for use in an HTML entity or HTML attribute.
     * 
     * <p>
     * The returned value is always suitable for an HTML <i>entity</i> but only
     * suitable for an HTML <i>attribute</i> if the attribute value is inside
     * double quotes. In other words the method is not safe for use with HTML
     * attributes unless you put the value in double quotes like this:
     * <pre>
     *    &lt;div title="value-from-this-method" &gt; ....
     * </pre>
     * Putting attribute values in double quotes is always a good idea anyway.
     * 
     * <p>The following characters will be escaped:
     * <ul>
     *   <li>{@code &} (ampersand) -- replaced with {@code &amp;}</li>
     *   <li>{@code <} (less than) -- replaced with {@code &lt;}</li>
     *   <li>{@code >} (greater than) -- replaced with {@code &gt;}</li>
     *   <li>{@code "} (double quote) -- replaced with {@code &quot;}</li>
     *   <li>{@code '} (single quote) -- replaced with {@code &#39;}</li>
     *   <li>{@code /} (forward slash) -- replaced with {@code &#47;}</li>
     * </ul>
     * It is not necessary to escape more than this as long as the HTML page
     * <a href="https://en.wikipedia.org/wiki/Character_encodings_in_HTML">uses
     * a Unicode encoding</a>. (Most web pages uses UTF-8 which is also the HTML5
     * recommendation.). Escaping more than this makes the HTML much less readable.
     * 
     * @param s the string to make HTML safe
     * @param avoidDoubleEscape avoid double escaping, which means for example not 
     *     escaping {@code &lt;} one more time. Any sequence {@code &....;}, as explained in
     *     {@link #isHtmlCharEntityRef(java.lang.String, int) isHtmlCharEntityRef()}, will not be escaped.
     * 
     * @return a HTML safe string 
     */
    public static String htmlEscape(String s, boolean avoidDoubleEscape) {
        if (s == null || s.length() == 0) {
            return s;
        }
        StringBuilder sb = new StringBuilder(s.length()+16);
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            switch (c) {
                case '&':
                    // Avoid double escaping if already escaped
                    if (avoidDoubleEscape && (isHtmlCharEntityRef(s, i))) {
                        sb.append('&');
                    } else {
                        sb.append("&amp;");
                    }
                    break;
                case '<':
                    sb.append("&lt;");
                    break;
                case '>':
                    sb.append("&gt;");
                    break;
                case '"':
                    sb.append("&quot;"); 
                    break;
                case '\'':
                    sb.append("&#39;"); 
                    break;
                case '/':
                    sb.append("&#47;"); 
                    break;
                default:
                    sb.append(c);
            }
        }
        return sb.toString();
  }

  /**
   * Checks if the value at {@code index} is a HTML entity reference. This
   * means any of :
   * <ul>
   *   <li>{@code &amp;} or {@code &lt;} or {@code &gt;} or {@code &quot;} </li>
   *   <li>A value of the form {@code &#dddd;} where {@code dddd} is a decimal value</li>
   *   <li>A value of the form {@code &#xhhhh;} where {@code hhhh} is a hexadecimal value</li>
   * </ul>
   * @param str the string to test for HTML entity reference.
   * @param index position of the {@code '&'} in {@code str}
   * @return 
   */
  public static boolean isHtmlCharEntityRef(String str, int index)  {
      if (str.charAt(index) != '&') {
          return false;
      }
      int indexOfSemicolon = str.indexOf(';', index + 1);
      if (indexOfSemicolon == -1) { // is there a semicolon sometime later ?
          return false;
      }
      if (!(indexOfSemicolon > (index + 2))) {   // is the string actually long enough
          return false;
      }
      if (followingCharsAre(str, index, "amp;")
              || followingCharsAre(str, index, "lt;")
              || followingCharsAre(str, index, "gt;")
              || followingCharsAre(str, index, "quot;")) {
          return true;
      }
      if (str.charAt(index+1) == '#') {
          if (str.charAt(index+2) == 'x' || str.charAt(index+2) == 'X') {
              // It's presumably a hex value
              if (str.charAt(index+3) == ';') {
                  return false;
              }
              for (int i = index+3; i < indexOfSemicolon; i++) {
                  char c = str.charAt(i);
                  if (c >= 48 && c <=57) {  // 0 -- 9
                      continue;
                  }
                  if (c >= 65 && c <=70) {   // A -- F
                      continue;
                  }
                  if (c >= 97 && c <=102) {   // a -- f
                      continue;
                  }
                  return false;  
              }
              return true;   // yes, the value is a hex string
          } else {
              // It's presumably a decimal value
              for (int i = index+2; i < indexOfSemicolon; i++) {
                  char c = str.charAt(i);
                  if (c >= 48 && c <=57) {  // 0 -- 9
                      continue;
                  }
                  return false;
              }
              return true; // yes, the value is decimal
          }
      }
      return false;
  } 


  /**
   * Tests if the chars following position <code>startIndex</code> in string
   * <code>str</code> are that of <code>nextChars</code>.
   * 
   * <p>Optimized for speed. Otherwise this method would be exactly equal to
   * {@code (str.indexOf(nextChars, startIndex+1) == (startIndex+1))}.
   *
   * @param str
   * @param startIndex
   * @param nextChars
   * @return 
   */  
  private static boolean followingCharsAre(String str, int startIndex, String nextChars)  {
      if ((startIndex + nextChars.length()) < str.length()) {
          for(int i = 0; i < nextChars.length(); i++) {
              if ( nextChars.charAt(i) != str.charAt(startIndex+i+1)) {
                  return false;
              }
          }
          return true;
      } else {
          return false;
      }
  }
}
Run Code Online (Sandbox Code Playgroud)

TODO:保留连续的空格。