在 Swift 中解码引用的可打印消息

iph*_*aaw 4 macos utf-8 quoted-printable swift

我有一个带引号的可打印字符串,例如“The cost would be =C2=A31,000”。如何将其转换为“成本为 \xc2\xa31,000”。

\n\n

我目前只是手动转换文本,这并不涵盖所有情况。我确信只有一行代码可以帮助解决这个问题。

\n\n

这是我的代码:

\n\n
func decodeUTF8(message: String) -> String\n{\n    var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString: ".", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString: "\xe2\x80\xa2", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString: "\xc2\xa3", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString: "\xc2\xa3", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString: "\\"", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString: "\xe2\x80\xa6", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString: "\\"", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString: "\'", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString: "=", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)\n    newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString: "\'", options: NSStringCompareOptions.LiteralSearch, range: nil)\n\n    return newMessage\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

谢谢

\n

Mar*_*n R 7

一个简单的方法是利用该(NS)String方法\nstringByRemovingPercentEncoding来达到此目的。\n这是在解码引用打印时\n中观察到的观察到的,\n因此第一个解决方案主要是将\n该线程中的答案翻译为 Swift。

\n\n

这个想法是用百分比编码“%NN”替换带引号的可打印“=NN”编码,然后使用现有方法删除百分比编码。

\n\n

连续行是单独处理的。\n此外,输入字符串中的百分号字符必须首先进行编码,\n否则它们将被视为百分号\n编码中的前导字符。

\n\n
func decodeQuotedPrintable(message : String) -> String? {\n    return message\n        .stringByReplacingOccurrencesOfString("=\\r\\n", withString: "")\n        .stringByReplacingOccurrencesOfString("=\\n", withString: "")\n        .stringByReplacingOccurrencesOfString("%", withString: "%25")\n        .stringByReplacingOccurrencesOfString("=", withString: "%")\n        .stringByRemovingPercentEncoding\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

该函数返回一个可选字符串,该字符串nil表示无效输入。\n无效输入可以是:

\n\n
    \n
  • 后面没有两个十六进制数字的“=”字符,\ne.g “=XX”。
  • \n
  • 无法解码为有效 UTF-8 序列的“=NN”序列,\ne.g “=E2=64”。
  • \n
\n\n

例子:

\n\n
if let decoded = decodeQuotedPrintable("=C2=A31,000") {\n    print(decoded) // \xc2\xa31,000\n}\n\nif let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {\n    print(decoded) // \xe2\x80\x9cHello \xe2\x80\xa6 world!\xe2\x80\x9d\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n
\n\n

更新 1:上面的代码假设消息使用 UTF-8\n 编码来引用非 ASCII 字符,如大多数示例中所示:C2 A3是“\xc2\xa3”的 UTF-8 编码,E2 80 A4是 UTF-8编码为\xe2\x80\xa6.

\n\n

如果输入是,"Rub=E9n"则消息正在使用\n Windows-1252编码。\n要正确解码,您必须替换

\n\n
.stringByRemovingPercentEncoding\n
Run Code Online (Sandbox Code Playgroud)\n\n

经过

\n\n
.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)\n
Run Code Online (Sandbox Code Playgroud)\n\n

还有一些方法可以从“Content-Type”标头字段检测编码,例如比较/sf/answers/2243617911/

\n\n
\n\n

更新2:stringByReplacingPercentEscapesUsingEncoding方法被标记为已弃用,因此上面的代码将始终生成编译器警告。不幸的是,苹果似乎没有提供替代方法。

\n\n

因此,这是一种新的、完全独立的解码方法,它不会导致任何编译器警告。这次我把它写成\作为一个扩展方法String. 解释性注释位于\n代码中。

\n\n
extension String {\n\n    /// Returns a new string made by removing in the `String` all "soft line\n    /// breaks" and replacing all quoted-printable escape sequences with the\n    /// matching characters as determined by a given encoding. \n    /// - parameter encoding:     A string encoding. The default is UTF-8.\n    /// - returns:                The decoded string, or `nil` for invalid input.\n\n    func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {\n\n        // Handle soft line breaks, then replace quoted-printable escape sequences. \n        return self\n            .stringByReplacingOccurrencesOfString("=\\r\\n", withString: "")\n            .stringByReplacingOccurrencesOfString("=\\n", withString: "")\n            .decodeQuotedPrintableSequences(enc)\n    }\n\n    /// Helper function doing the real work.\n    /// Decode all "=HH" sequences with respect to the given encoding.\n\n    private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {\n\n        var result = ""\n        var position = startIndex\n\n        // Find the next "=" and copy characters preceding it to the result:\n        while let range = rangeOfString("=", range: position ..< endIndex) {\n            result.appendContentsOf(self[position ..< range.startIndex])\n            position = range.startIndex\n\n            // Decode one or more successive "=HH" sequences to a byte array:\n            let bytes = NSMutableData()\n            repeat {\n                let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]\n                if hexCode.characters.count < 2 {\n                    return nil // Incomplete hex code\n                }\n                guard var byte = UInt8(hexCode, radix: 16) else {\n                    return nil // Invalid hex code\n                }\n                bytes.appendBytes(&byte, length: 1)\n                position = position.advancedBy(3)\n            } while position != endIndex && self[position] == "="\n\n            // Convert the byte array to a string, and append it to the result:\n            guard let dec = String(data: bytes, encoding: enc) else {\n                return nil // Decoded bytes not valid in the given encoding\n            }\n            result.appendContentsOf(dec)\n        }\n\n        // Copy remaining characters to the result:\n        result.appendContentsOf(self[position ..< endIndex])\n\n        return result\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

用法示例:

\n\n
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {\n    print(decoded) // \xc2\xa31,000\n}\n\nif let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {\n    print(decoded) // \xe2\x80\x9cHello \xe2\x80\xa6 world!\xe2\x80\x9d\n}\n\nif let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {\n    print(decoded) // Rub\xc3\xa9n\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n
\n\n

Swift 4(及更高版本)更新:

\n\n
extension String {\n\n    /// Returns a new string made by removing in the `String` all "soft line\n    /// breaks" and replacing all quoted-printable escape sequences with the\n    /// matching characters as determined by a given encoding.\n    /// - parameter encoding:     A string encoding. The default is UTF-8.\n    /// - returns:                The decoded string, or `nil` for invalid input.\n\n    func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {\n\n        // Handle soft line breaks, then replace quoted-printable escape sequences.\n        return self\n            .replacingOccurrences(of: "=\\r\\n", with: "")\n            .replacingOccurrences(of: "=\\n", with: "")\n            .decodeQuotedPrintableSequences(encoding: enc)\n    }\n\n    /// Helper function doing the real work.\n    /// Decode all "=HH" sequences with respect to the given encoding.\n\n    private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {\n\n        var result = ""\n        var position = startIndex\n\n        // Find the next "=" and copy characters preceding it to the result:\n        while let range = range(of: "=", range: position..<endIndex) {\n            result.append(contentsOf: self[position ..< range.lowerBound])\n            position = range.lowerBound\n\n            // Decode one or more successive "=HH" sequences to a byte array:\n            var bytes = Data()\n            repeat {\n                let hexCode = self[position...].dropFirst().prefix(2)\n                if hexCode.count < 2 {\n                    return nil // Incomplete hex code\n                }\n                guard let byte = UInt8(hexCode, radix: 16) else {\n                    return nil // Invalid hex code\n                }\n                bytes.append(byte)\n                position = index(position, offsetBy: 3)\n            } while position != endIndex && self[position] == "="\n\n            // Convert the byte array to a string, and append it to the result:\n            guard let dec = String(data: bytes, encoding: enc) else {\n                return nil // Decoded bytes not valid in the given encoding\n            }\n            result.append(contentsOf: dec)\n        }\n\n        // Copy remaining characters to the result:\n        result.append(contentsOf: self[position ..< endIndex])\n\n        return result\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

用法示例:

\n\n
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {\n    print(decoded) // \xc2\xa31,000\n}\n\nif let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {\n    print(decoded) // \xe2\x80\x9cHello \xe2\x80\xa6 world!\xe2\x80\x9d\n}\n\nif let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {\n    print(decoded) // Rub\xc3\xa9n\n}\n
Run Code Online (Sandbox Code Playgroud)\n