我正在尝试为这个面试问题写一个简单的程序:
编写一个检查有效unicode字节序列的函数.unicode序列编码为: - 第一个字节表示后续字节数"11110000"表示4个后续数据字节 - 数据字节以"10xxxxxx"开头
public static void main(String[] args)
{
System.out.println(checkUnicode(new byte[] {(byte)'c'}));
}
/**
* Write a function that checks for valid unicode byte sequence. A unicode
* sequence is encoded as: - first byte indicates number of subsequent bytes
* '1111000' means 4 subsequent data bytes - data bytes start with a
* '10xxxxxx'
*
* @param unicodeChar
* @return
*/
public static boolean checkUnicode(byte[] unicodeChar)
{
byte b = unicodeChar[0];
int len = 0;
int temp = (int)b<<1;
while((int)temp<<1 == 0)
{
len++;
}
System.out.println(len);
if (unicodeChar.length == len)
{
for(int i = 1 ; i < len; i++)
{
// Check if Most significant 2 bits in the byte are '10'
// c0, in base 16, is 11000000 in binary
// 10000000, in base 2, is 128 in decimal
if( ( (int)unicodeChar[i]&0Xc0 )==128 )
{
continue;
}
else
{
return false;
}
}
return true;
}
else
{
return false;
}
}
The output I get is
99
false
Run Code Online (Sandbox Code Playgroud)
根据Chris Jester-Young的评论改变了从char到byte数组的转换.
有人能指出我正确的方向
谢谢
根据Ted Hopp的输入做了一些修改.
PS:
我从一些论坛得到了问题,我认为它没有在那里正确发布,但是我仍然决定解决它并按原样使用它以防止混淆它,因为我完全不理解它!
这是适用于企业级作业的企业级解决方案:
public static void main(String[] args) {
if (args.length == 0 || args[0] == null || (args[0] = args[0].trim()).isEmpty()) {
System.out.println("No argument passed or argument empty!");
return;
}
String arg = args[0];
System.out.println("arg: " + arg + ", arg len: " + arg.length());
BitSet bs = new BitSet(arg.length());
for (int i = 0; i < arg.length(); i++) {
if (arg.charAt(i) == '1') {
bs.set(i, true);
}
}
ByteBuffer bb = ByteBuffer.wrap(bs.toByteArray());
Charset cs = Charset.forName("UTF-8");
CharsetDecoder csd =
cs.newDecoder().onMalformedInput(CodingErrorAction.REPORT).
onUnmappableCharacter(CodingErrorAction.REPORT)
;
try {
CharBuffer cb = csd.decode(bb);
String uns = cb.toString();
System.out.println("Got unicode string of len " + uns.length() + ": " + uns + " from " + arg + " -- no errors!");
} catch (CharacterCodingException cce) {
System.out.println("Invalid UTF-8 unicode string! " + cce.getMessage());
}
}
Run Code Online (Sandbox Code Playgroud)
验证:
public static void test() {
StringBuilder sb = new StringBuilder();
byte[] byt = new String("stupid interview").getBytes();
BitSet byt1 = fromByteArray(byt);
for (int i = 0; i < byt1.size(); i++) {
sb.append(byt1.get(i) ? "1" : "0");
}
String[] st = new String[1];
st[0] = sb.toString();
main(st);
}
public static BitSet fromByteArray(byte[] bytes) {
BitSet bits = new BitSet();
for (int i=0; i<bytes.length*8; i++) {
if ((bytes[bytes.length-i/8-1]&(1<<(i%8))) > 0) {
bits.set(i);
}
}
return bits;
}
Run Code Online (Sandbox Code Playgroud)
输出:
11001110001011101010111000001110100101100010011000000100100101100111011000101110101001100100111001101110100101101010011011101110
arg: 11001110001011101010111000001110100101100010011000000100100101100111011000101110101001100100111001101110100101101010011011101110, arg len: 128
{0, 1, 4, 5, 6, 10, 12, 13, 14, 16, 18, 20, 21, 22, 28, 29, 30, 32, 35, 37, 38, 42, 45, 46, 53, 56, 59, 61, 62, 65, 66, 67, 69, 70, 74, 76, 77, 78, 80, 82, 85, 86, 89, 92, 93, 94, 97, 98, 100, 101, 102, 104, 107, 109, 110, 112, 114, 117, 118, 120, 121, 122, 124, 125, 126}
Got unicode string of len 16: stupid interview from 11001110001011101010111000001110100101100010011000000100100101100111011000101110101001100100111001101110100101101010011011101110 -- no errors!
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4891 次 |
| 最近记录: |