iOS Objective-C NSData to NSString return nil,如何忽略无效的UTF-8

xhg*_*xhg 3 encoding objective-c nsdata ios

data 从网站下载,

NSString * html = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
Run Code Online (Sandbox Code Playgroud)

htmlnil,但是

NSString * html = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
Run Code Online (Sandbox Code Playgroud)

会有内容。由于网站包含汉字,如果使用Ascii,则无法显示中文。我猜网站中有一些无效的 UTF-8,所以第一个代码不起作用。

是否有任何方法可以继续使用 UTF-8 但忽略一些无效错误?

xhg*_*xhg 5

我想我找到了解决方案。

文森特·古尔奇的回答

将 libiconv 添加到您的项目中并让它清理无效的 UTF-8,清理后,NSData 可以安全地传递给 [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];

详细实现是:

  1. 将“Link Binary With Libraries”中的“libiconv.2.dylib”添加到您的目标。
  2. #include "iconv.h"
  3. 添加此功能:

目标 C:

- (NSData *)cleanUTF8:(NSData *)data {
    // this function is from
    // /sf/ask/243963331/
    //
    //
    iconv_t cd = iconv_open("UTF-8", "UTF-8"); // convert to UTF-8 from UTF-8
    int one = 1;
    iconvctl(cd, ICONV_SET_DISCARD_ILSEQ, &one); // discard invalid characters
    size_t inbytesleft, outbytesleft;
    inbytesleft = outbytesleft = data.length;
    char *inbuf  = (char *)data.bytes;
    char *outbuf = malloc(sizeof(char) * data.length);
    char *outptr = outbuf;
    if (iconv(cd, &inbuf, &inbytesleft, &outptr, &outbytesleft)
        == (size_t)-1) {
        NSLog(@"this should not happen, seriously");
        return nil;
    }
    NSData *result = [NSData dataWithBytes:outbuf length:data.length - outbytesleft];
    iconv_close(cd);
    free(outbuf);
    return result;
}
Run Code Online (Sandbox Code Playgroud)