我为这个愚蠢的问题道歉.我正在维护旧的遗留VB6代码,我有一个实际工作的功能 - 但我根本无法弄清楚它为何起作用,或者为什么没有它就无法运行代码.
基本上,此函数读取UTF-8文本文件并在DHTMLEdit组件中显示其内容.它的方式是,它将整个文件读入一个字符串,然后使用ANSI代码页将其从双字节转换为多字节字符串,然后将其转换回双字节.
使用这整个精心设计的机制可以使组件同时正确显示包含希伯来语,阿拉伯语,泰语和中文的页面.不使用此代码使文本看起来像被转换为ASCII,显示字母曾经的各种标点符号.
我不明白的是:
[码]
Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal codePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal codePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, lpUsedDefaultChar As Long) As Long
Private Declare Function GetACP Lib "kernel32" () As Long
...
Open filePath For Input As #lFilePtr
Dim sInput as String
dim sResult as string
Do While Not EOF(lFilePtr)
Line Input #lFilePtr, sInput
sResult = sResult + sInput;
Loop
txtBody.DOM.Body.innerText = DecodeString(sResult, CP_UTF8);
Public Function DecodeString(ByVal strSource As String, Optional FromCodePage As Long = -1) As String
Dim strTemp As String
If strSource = vbNullString Then Exit Function
strTemp = UnicodeToAnsi(strSource, 0)
DecodeString = AnsiToUnicode(strTemp, FromCodePage)
End Function
Public Function AnsiToUnicode(ByVal strSource As String, Optional ByVal codePage As Long = -1, Optional lFlags As Long = 0) As String
Dim strBuffer As String
Dim cwch As Long
Dim pwz As Long
Dim pwzBuffer As Long
If codePage = -1 Then codePage = GetACP()
pwz = StrPtr(strSource)
cwch = MultiByteToWideChar(codePage, lFlags, pwz, -1, 0&, 0&)
strBuffer = String$(cwch + 1, vbNullChar)
pwzBuffer = StrPtr(strBuffer)
cwch = MultiByteToWideChar(codePage, lFlags, pwz, -1, pwzBuffer, Len(strBuffer))
AnsiToUnicode = Left(strBuffer, cwch - 1)
End Function
Public Function UnicodeToAnsi(ByVal strSource As String, Optional ByVal codePage As Long = -1, Optional lFlags As Long = 0) As String
Dim strBuffer As String
Dim cwch As Long
Dim pwz As Long
Dim pwzBuffer As Long
If codePage = -1 Then codePage = GetACP()
pwz = StrPtr(strSource)
cwch = WideCharToMultiByte(codePage, lFlags, pwz, -1, 0&, 0&, ByVal 0&, ByVal 0&)
strBuffer = String$(cwch + 1, vbNullChar)
pwzBuffer = StrPtr(strBuffer)
cwch = WideCharToMultiByte(codePage, lFlags, pwz, -1, pwzBuffer, Len(strBuffer), ByVal 0&, ByVal 0&)
UnicodeToAnsi = Left(strBuffer, cwch - 1)
End Function
Run Code Online (Sandbox Code Playgroud)
[码]
当使用内置运算符读/写文件时,VB6/A使用隐式双向UTF16-ASCII转换.
Line Input
将文件视为ASCII(一系列字节,每个字节代表一个字符),使用非Unicode程序的当前系统代码页.读取的字符将转换为UTF-16.
当您以这种方式读取UTF-8文件时,您得到的是"无效"字符串 - 您不能直接在该语言中使用它(如果您尝试将看到垃圾),但它包含可用的二进制数据.
然后将指向该可用二进制数据的指针传递给WideCharToMultiByte
(in UnicodeToAnsi
),这导致创建另一个"无效"字符串 - 这次它包含"ASCII"数据.实际上,这会恢复VB自动执行的转换Line Input
,并且由于原始文件是UTF-8,因此您现在有一个"无效"字符串,其中包含UTF-8数据,尽管转换函数认为它正在转换为ASCII.
指向第二个无效字符串的指针传递给MultiByteToWideChar
(in AnsiToUnicode
),最终创建一个可在VB中使用的有效字符串.
关于此代码的令人困惑的部分是string
s用于包含"无效"数据.从逻辑上讲,所有这些都应该是字节数组.我将重构代码以二进制模式从文件中读取字节并MultiByteToWideChar
直接传递数组.
归档时间: |
|
查看次数: |
975 次 |
最近记录: |