如何在Swift中从UTF8创建一个String?

jxw*_*who 17 string utf-8 swift

我们知道我们可以用UTF8代码单位打印每个字符吗?那么,如果我们有这些字符的代码单元,我们如何用它们创建一个String?

小智 14

可以使用UTF8Swift类以惯用方式将UTF8代码点转换为Swift String .虽然从String转换为UTF8要容易得多!

import Foundation

public class UTF8Encoding {
  public static func encode(bytes: Array<UInt8>) -> String {
    var encodedString = ""
    var decoder = UTF8()
    var generator = bytes.generate()
    var finished: Bool = false
    do {
      let decodingResult = decoder.decode(&generator)
      switch decodingResult {
      case .Result(let char):
        encodedString.append(char)
      case .EmptyInput:
        finished = true
      /* ignore errors and unexpected values */
      case .Error:
        finished = true
      default:
        finished = true
      }
    } while (!finished)
    return encodedString
  }

  public static func decode(str: String) -> Array<UInt8> {
    var decodedBytes = Array<UInt8>()
    for b in str.utf8 {
      decodedBytes.append(b)
    }
    return decodedBytes
  }
}

func testUTF8Encoding() {
  let testString = "A UTF8 String With Special Characters: "
  let decodedArray = UTF8Encoding.decode(testString)
  let encodedString = UTF8Encoding.encode(decodedArray)
  XCTAssert(encodedString == testString, "UTF8Encoding is lossless: \(encodedString) != \(testString)")
}
Run Code Online (Sandbox Code Playgroud)

建议的其他替代方案:

  • 使用NSString调用Objective-C桥;

  • 使用UnicodeScalar是容易出错的,因为它将UnicodeScalars直接转换为Characters,忽略了复杂的字形簇; 和

  • 使用String.fromCString可能不安全,因为它使用指针.

  • 感谢您解码UTF8编码!你可以从顶部删除`import Foundation`,这就是我想要使用它的全部原因.. (2认同)

Ima*_*tit 11

使用 Swift 5,您可以选择以下方式之一将 UTF-8 代码单元的集合转换为字符串。


#1. 使用Stringinit(_:)初始化程序

如果您有一个String.UTF8View实例(即一组 UTF-8 代码单元)并希望将其转换为字符串,则可以使用init(_:)初始化程序。init(_:)有以下声明:

init(_ utf8: String.UTF8View)
Run Code Online (Sandbox Code Playgroud)

创建与给定的 UTF-8 代码单元序列对应的字符串。

下面的 Playground 示例代码展示了如何使用init(_:)

let string = "Café "
let utf8View: String.UTF8View = string.utf8

let newString = String(utf8View)
print(newString) // prints: Café 
Run Code Online (Sandbox Code Playgroud)

#2. 使用Swiftinit(decoding:as:)初始化程序

init(decoding:as:) 从指定编码的给定 Unicode 代码单元集合中创建一个字符串:

let string = "Café "
let codeUnits: [Unicode.UTF8.CodeUnit] = Array(string.utf8)

let newString = String(decoding: codeUnits, as: UTF8.self)
print(newString) // prints: Café 
Run Code Online (Sandbox Code Playgroud)

请注意,这init(decoding:as:)也适用于String.UTF8View参数:

let string = "Café "
let utf8View: String.UTF8View = string.utf8

let newString = String(decoding: utf8View, as: UTF8.self)
print(newString) // prints: Café 
Run Code Online (Sandbox Code Playgroud)

#3. 使用transcode(_:from:to:stoppingOnError:into:)功能

以下示例将初始字符串的 UTF-8 表示转码为可用于构建新字符串的 Unicode 标量值(UTF-32 代码单元):

let string = "Café "
let bytes = Array(string.utf8)

var newString = ""
_ = transcode(bytes.makeIterator(), from: UTF8.self, to: UTF32.self, stoppingOnError: true, into: {
    newString.append(String(Unicode.Scalar($0)!))
})
print(newString) // prints: Café 
Run Code Online (Sandbox Code Playgroud)

#4. 使用ArraywithUnsafeBufferPointer(_:)方法和Stringinit(cString:)初始化

init(cString:) 有以下声明:

init(cString: UnsafePointer<CChar>)
Run Code Online (Sandbox Code Playgroud)

通过复制给定指针引用的以空字符结尾的 UTF-8 数据创建一个新字符串。

以下示例显示如何使用init(cString:)指向CChar数组内容的指针(即格式良好的 UTF-8 代码单元序列)以从中创建字符串:

let bytes: [CChar] = [67, 97, 102, -61, -87, 32, -16, -97, -121, -85, -16, -97, -121, -73, 0]

let newString = bytes.withUnsafeBufferPointer({ (bufferPointer: UnsafeBufferPointer<CChar>)in
    return String(cString: bufferPointer.baseAddress!)
})
print(newString) // prints: Café 
Run Code Online (Sandbox Code Playgroud)

#5. usingUnicode.UTF8decode(_:)方法

要解码代码单元序列,请decode(_:)重复调用直到它返回UnicodeDecodingResult.emptyInput

let string = "Café "
let codeUnits = Array(string.utf8)

var codeUnitIterator = codeUnits.makeIterator()
var utf8Decoder = Unicode.UTF8()
var newString = ""

Decode: while true {
    switch utf8Decoder.decode(&codeUnitIterator) {
    case .scalarValue(let value):
        newString.append(Character(Unicode.Scalar(value)))
    case .emptyInput:
        break Decode
    case .error:
        print("Decoding error")
        break Decode
    }
}

print(newString) // prints: Café 
Run Code Online (Sandbox Code Playgroud)

#6. 使用Stringinit(bytes:encoding:)初始化程序

Foundation 提供String了一个init(bytes:encoding:)初始化程序,您可以按照以下 Playground 示例代码中的说明使用它:

import Foundation

let string = "Café "
let bytes: [Unicode.UTF8.CodeUnit] = Array(string.utf8)

let newString = String(bytes: bytes, encoding: String.Encoding.utf8)
print(String(describing: newString)) // prints: Optional("Café ")
Run Code Online (Sandbox Code Playgroud)


Bry*_*hen 5

改进Martin R的答案

import AppKit

let utf8 : CChar[] = [65, 66, 67, 0]
let str = NSString(bytes: utf8, length: utf8.count, encoding: NSUTF8StringEncoding)
println(str) // Output: ABC
Run Code Online (Sandbox Code Playgroud)
import AppKit

let utf8 : UInt8[] = [0xE2, 0x82, 0xAC, 0]
let str = NSString(bytes: utf8, length: utf8.count, encoding: NSUTF8StringEncoding)
println(str) // Output: €
Run Code Online (Sandbox Code Playgroud)

发生的事情是Array可以自动转换为CConstVoidPointer可以用来创建字符串NSSString(bytes: CConstVoidPointer, length len: Int, encoding: Uint)

  • 请注意,您的代码也会将0字节转换为创建的NSString中的NUL字符. (3认同)

Mar*_*n R 1

这是一个可能的解决方案(现已更新为Swift 2):

\n\n
let utf8 : [CChar] = [65, 66, 67, 0]\nif let str = utf8.withUnsafeBufferPointer( { String.fromCString($0.baseAddress) }) {\n    print(str) // Output: ABC\n} else {\n    print("Not a valid UTF-8 string") \n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

在闭包内,$0是一个UnsafeBufferPointer<CChar>指向数组的连续存储的指针。由此String可以创建 Swift。

\n\n

或者,如果您希望输入为无符号字节:

\n\n
let utf8 : [UInt8] = [0xE2, 0x82, 0xAC, 0]\nif let str = utf8.withUnsafeBufferPointer( { String.fromCString(UnsafePointer($0.baseAddress)) }) {\n    print(str) // Output: \xe2\x82\xac\n} else {\n    print("Not a valid UTF-8 string")\n}\n
Run Code Online (Sandbox Code Playgroud)\n