Joh*_*ley 84 javascript java unicode utf-8
我一直在试验各种Java代码试图想出一些东西,它将编码一个包含引号,空格和"奇异"Unicode字符的字符串,并产生与JavaScript的encodeURIComponent函数相同的输出.
我的折磨测试字符串是:"A"B±"
如果我在Firebug中输入以下JavaScript语句:
encodeURIComponent('"A" B ± "');
Run Code Online (Sandbox Code Playgroud)
- 然后我得到:
"%22A%22%20B%20%C2%B1%20%22"
Run Code Online (Sandbox Code Playgroud)
这是我的小测试Java程序:
import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;
public class EncodingTest
{
public static void main(String[] args) throws UnsupportedEncodingException
{
String s = "\"A\" B ± \"";
System.out.println("URLEncoder.encode returns "
+ URLEncoder.encode(s, "UTF-8"));
System.out.println("getBytes returns "
+ new String(s.getBytes("UTF-8"), "ISO-8859-1"));
}
}
Run Code Online (Sandbox Code Playgroud)
- 该计划输出:
URLEncoder.encode returns %22A%22+B+%C2%B1+%22 getBytes returns "A" B ± "
关闭,但没有雪茄!使用Java编码UTF-8字符串的最佳方法是什么,以便它产生与JavaScript相同的输出encodeURIComponent?
编辑:我很快就使用Java 1.4迁移到Java 5.
Joh*_*ley 110
这是我最终提出的课程:
import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.net.URLEncoder;
/**
* Utility class for JavaScript compatible UTF-8 encoding and decoding.
*
* @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output
* @author John Topley
*/
public class EncodingUtil
{
/**
* Decodes the passed UTF-8 String using an algorithm that's compatible with
* JavaScript's <code>decodeURIComponent</code> function. Returns
* <code>null</code> if the String is <code>null</code>.
*
* @param s The UTF-8 encoded String to be decoded
* @return the decoded String
*/
public static String decodeURIComponent(String s)
{
if (s == null)
{
return null;
}
String result = null;
try
{
result = URLDecoder.decode(s, "UTF-8");
}
// This exception should never occur.
catch (UnsupportedEncodingException e)
{
result = s;
}
return result;
}
/**
* Encodes the passed String as UTF-8 using an algorithm that's compatible
* with JavaScript's <code>encodeURIComponent</code> function. Returns
* <code>null</code> if the String is <code>null</code>.
*
* @param s The String to be encoded
* @return the encoded String
*/
public static String encodeURIComponent(String s)
{
String result = null;
try
{
result = URLEncoder.encode(s, "UTF-8")
.replaceAll("\\+", "%20")
.replaceAll("\\%21", "!")
.replaceAll("\\%27", "'")
.replaceAll("\\%28", "(")
.replaceAll("\\%29", ")")
.replaceAll("\\%7E", "~");
}
// This exception should never occur.
catch (UnsupportedEncodingException e)
{
result = s;
}
return result;
}
/**
* Private constructor to prevent this class from being instantiated.
*/
private EncodingUtil()
{
super();
}
}
Run Code Online (Sandbox Code Playgroud)
Tom*_*lak 58
看看实现差异,我看到:
[-a-zA-Z0-9._*~'()!][-a-zA-Z0-9._*]" "转换为加号"+".基本上,要获得所需的结果,请使用URLEncoder.encode(s, "UTF-8")然后进行一些后处理:
"+"与"%20""%xx"任何表示替换[~'()!]回其文字对位部分Rav*_*lau 12
使用Java 6附带的javascript引擎:
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
public class Wow
{
public static void main(String[] args) throws Exception
{
ScriptEngineManager factory = new ScriptEngineManager();
ScriptEngine engine = factory.getEngineByName("JavaScript");
engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");
}
}
产量:%22A%22%20B%20%c2%b1%20%22
情况有所不同,但它更接近你想要的.
我使用java.net.URI#getRawPath(),例如
String s = "a+b c.html";
String fixed = new URI(null, null, s, null).getRawPath();
Run Code Online (Sandbox Code Playgroud)
的价值fixed将是a+b%20c.html,这是你想要的.
对输出进行后处理URLEncoder.encode()将消除应该在URI中的任何优缺点.例如
URLEncoder.encode("a+b c.html").replaceAll("\\+", "%20");
Run Code Online (Sandbox Code Playgroud)
会给你a%20b%20c.html,这将被解释为a b c.html.
小智 5
我想出了我自己的encodeURIComponent版本,因为发布的解决方案有一个问题,如果字符串中存在一个+,应该编码,它将转换为空格。
所以这是我的课:
import java.io.UnsupportedEncodingException;
import java.util.BitSet;
public final class EscapeUtils
{
/** used for the encodeURIComponent function */
private static final BitSet dontNeedEncoding;
static
{
dontNeedEncoding = new BitSet(256);
// a-z
for (int i = 97; i <= 122; ++i)
{
dontNeedEncoding.set(i);
}
// A-Z
for (int i = 65; i <= 90; ++i)
{
dontNeedEncoding.set(i);
}
// 0-9
for (int i = 48; i <= 57; ++i)
{
dontNeedEncoding.set(i);
}
// '()*
for (int i = 39; i <= 42; ++i)
{
dontNeedEncoding.set(i);
}
dontNeedEncoding.set(33); // !
dontNeedEncoding.set(45); // -
dontNeedEncoding.set(46); // .
dontNeedEncoding.set(95); // _
dontNeedEncoding.set(126); // ~
}
/**
* A Utility class should not be instantiated.
*/
private EscapeUtils()
{
}
/**
* Escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( )
*
* @param input
* A component of a URI
* @return the escaped URI component
*/
public static String encodeURIComponent(String input)
{
if (input == null)
{
return input;
}
StringBuilder filtered = new StringBuilder(input.length());
char c;
for (int i = 0; i < input.length(); ++i)
{
c = input.charAt(i);
if (dontNeedEncoding.get(c))
{
filtered.append(c);
}
else
{
final byte[] b = charToBytesUTF(c);
for (int j = 0; j < b.length; ++j)
{
filtered.append('%');
filtered.append("0123456789ABCDEF".charAt(b[j] >> 4 & 0xF));
filtered.append("0123456789ABCDEF".charAt(b[j] & 0xF));
}
}
}
return filtered.toString();
}
private static byte[] charToBytesUTF(char c)
{
try
{
return new String(new char[] { c }).getBytes("UTF-8");
}
catch (UnsupportedEncodingException e)
{
return new byte[] { (byte) c };
}
}
}
Run Code Online (Sandbox Code Playgroud)