我一直在玩python和Java的MD5实现,并遇到了令我困惑的这个怪癖.
以下python脚本说明了问题:
# -*- coding: utf-8 -*-
import hashlib
def md5hash(x):
m = hashlib.md5()
m.update(x)
return m.hexdigest()
print md5hash('\xdb')
print md5hash('Û')
Run Code Online (Sandbox Code Playgroud)
输出:
98fd00d788afe2a5fa5e4f8e1666638b
31ecfb09f120720a55d96a2034f5d00b
Run Code Online (Sandbox Code Playgroud)
我预计两个摘要是等价的,因为它Û应该相当于\xdb.我在Java中构建了一个等效的实现来获得更多的洞察力:
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class Test {
public static void main(String[] args) throws Exception {
MessageDigest m = MessageDigest.getInstance("MD5");
m.update("\u00db".getBytes());
System.out.println(bytesToHex(m.digest()));
m.update("Û".getBytes());
System.out.println(bytesToHex(m.digest()));
}
final protected static char[] hexArray = "0123456789abcdef".toCharArray();
public static String bytesToHex(byte[] bytes) {
char[] hexChars = new char[bytes.length * 2];
for ( int j = 0; …Run Code Online (Sandbox Code Playgroud)