tre*_*nik 102 html iphone cocoa cocoa-touch objective-c
首先,我发现了这个: Objective C HTML escape/unescape,但它对我不起作用.
我编码的字符(来自RSS提要,顺便说一句)看起来像这样: &
我在网上搜索并找到了相关的讨论,但没有修复我的特定编码,我认为它们被称为十六进制字符.
Mic*_*all 163
查看我的NSString类别的HTML.以下是可用的方法:
- (NSString *)stringByConvertingHTMLToPlainText;
- (NSString *)stringByDecodingHTMLEntities;
- (NSString *)stringByEncodingHTMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;
Run Code Online (Sandbox Code Playgroud)
Wal*_*ung 52
Daniel的那个基本上非常好,我在那里解决了一些问题:
删除了NSSCanner的跳过字符(否则将忽略两个连续实体之间的空格
[scanner setCharactersToBeSkipped:nil];
当存在孤立的'&'符号时修复了解析(我不确定这是什么'正确'输出,我只是将它与firefox进行比较):
例如
&#ABC DF & B' & C' Items (288)
Run Code Online (Sandbox Code Playgroud)
这是修改后的代码:
- (NSString *)stringByDecodingXMLEntities {
NSUInteger myLength = [self length];
NSUInteger ampIndex = [self rangeOfString:@"&" options:NSLiteralSearch].location;
// Short-circuit if there are no ampersands.
if (ampIndex == NSNotFound) {
return self;
}
// Make result string with some extra capacity.
NSMutableString *result = [NSMutableString stringWithCapacity:(myLength * 1.25)];
// First iteration doesn't need to scan to & since we did that already, but for code simplicity's sake we'll do it again with the scanner.
NSScanner *scanner = [NSScanner scannerWithString:self];
[scanner setCharactersToBeSkipped:nil];
NSCharacterSet *boundaryCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@" \t\n\r;"];
do {
// Scan up to the next entity or the end of the string.
NSString *nonEntityString;
if ([scanner scanUpToString:@"&" intoString:&nonEntityString]) {
[result appendString:nonEntityString];
}
if ([scanner isAtEnd]) {
goto finish;
}
// Scan either a HTML or numeric character entity reference.
if ([scanner scanString:@"&" intoString:NULL])
[result appendString:@"&"];
else if ([scanner scanString:@"'" intoString:NULL])
[result appendString:@"'"];
else if ([scanner scanString:@""" intoString:NULL])
[result appendString:@"\""];
else if ([scanner scanString:@"<" intoString:NULL])
[result appendString:@"<"];
else if ([scanner scanString:@">" intoString:NULL])
[result appendString:@">"];
else if ([scanner scanString:@"&#" intoString:NULL]) {
BOOL gotNumber;
unsigned charCode;
NSString *xForHex = @"";
// Is it hex or decimal?
if ([scanner scanString:@"x" intoString:&xForHex]) {
gotNumber = [scanner scanHexInt:&charCode];
}
else {
gotNumber = [scanner scanInt:(int*)&charCode];
}
if (gotNumber) {
[result appendFormat:@"%C", (unichar)charCode];
[scanner scanString:@";" intoString:NULL];
}
else {
NSString *unknownEntity = @"";
[scanner scanUpToCharactersFromSet:boundaryCharacterSet intoString:&unknownEntity];
[result appendFormat:@"&#%@%@", xForHex, unknownEntity];
//[scanner scanUpToString:@";" intoString:&unknownEntity];
//[result appendFormat:@"&#%@%@;", xForHex, unknownEntity];
NSLog(@"Expected numeric character entity but got &#%@%@;", xForHex, unknownEntity);
}
}
else {
NSString *amp;
[scanner scanString:@"&" intoString:&]; //an isolated & symbol
[result appendString:amp];
/*
NSString *unknownEntity = @"";
[scanner scanUpToString:@";" intoString:&unknownEntity];
NSString *semicolon = @"";
[scanner scanString:@";" intoString:&semicolon];
[result appendFormat:@"%@%@", unknownEntity, semicolon];
NSLog(@"Unsupported XML character entity %@%@", unknownEntity, semicolon);
*/
}
}
while (![scanner isAtEnd]);
finish:
return result;
}
Run Code Online (Sandbox Code Playgroud)
Mat*_*ges 46
这些被称为角色实体参考.当它们采取形式时,&#<number>;
它们被称为数字实体引用.基本上,它是应该替换的字节的字符串表示.在这种情况下&
,它表示ISO-8859-1字符编码方案中值为38的字符,即&
.
&符号必须在RSS中编码的原因是它是一个保留的特殊字符.
什么,你需要做的是分析字符串和匹配的值的字节替换实体&#
和;
.我不知道在目标C中有什么好方法可以做到这一点,但是这个堆栈溢出问题可能会有所帮助.
编辑:自从大约两年前回答这个问题以来,有一些很好的解决方案; 请参阅下面的@Michael Waterfall的答案.
Bry*_*uby 45
从iOS 7开始,您可以使用NSAttributedString
带有NSHTMLTextDocumentType
属性的本地解码HTML字符:
NSString *htmlString = @" & & < > ™ © ♥ ♣ ♠ ♦";
NSData *stringData = [htmlString dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *options = @{NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType};
NSAttributedString *decodedString;
decodedString = [[NSAttributedString alloc] initWithData:stringData
options:options
documentAttributes:NULL
error:NULL];
Run Code Online (Sandbox Code Playgroud)
解码后的属性字符串现在将显示为:&& <>™©♥♣♠♦.
注意:这仅在主线程上调用时才有效.
Nik*_*bak 35
似乎没有人提到最简单的选择之一:Google Toolbox for Mac
(尽管名称,这也适用于iOS.)
https://github.com/google/google-toolbox-for-mac/blob/master/Foundation/GTMNSString%2BHTML.h
/// Get a string where internal characters that are escaped for HTML are unescaped
//
/// For example, '&' becomes '&'
/// Handles   and 2 cases as well
///
// Returns:
// Autoreleased NSString
//
- (NSString *)gtm_stringByUnescapingFromHTML;
Run Code Online (Sandbox Code Playgroud)
我不得不在项目中只包含三个文件:标题,实现和GTMDefines.h
.
Dan*_*son 17
我应该把这个发布在GitHub上.这是一个NSString类,NSScanner
用于实现,并处理十六进制和十进制数字字符实体以及通常的符号实体.
此外,它处理格式错误的字符串(当你有一个&后跟一个无效的字符序列)相对优雅,这在我发布的使用此代码的应用程序中变得至关重要.
- (NSString *)stringByDecodingXMLEntities {
NSUInteger myLength = [self length];
NSUInteger ampIndex = [self rangeOfString:@"&" options:NSLiteralSearch].location;
// Short-circuit if there are no ampersands.
if (ampIndex == NSNotFound) {
return self;
}
// Make result string with some extra capacity.
NSMutableString *result = [NSMutableString stringWithCapacity:(myLength * 1.25)];
// First iteration doesn't need to scan to & since we did that already, but for code simplicity's sake we'll do it again with the scanner.
NSScanner *scanner = [NSScanner scannerWithString:self];
do {
// Scan up to the next entity or the end of the string.
NSString *nonEntityString;
if ([scanner scanUpToString:@"&" intoString:&nonEntityString]) {
[result appendString:nonEntityString];
}
if ([scanner isAtEnd]) {
goto finish;
}
// Scan either a HTML or numeric character entity reference.
if ([scanner scanString:@"&" intoString:NULL])
[result appendString:@"&"];
else if ([scanner scanString:@"'" intoString:NULL])
[result appendString:@"'"];
else if ([scanner scanString:@""" intoString:NULL])
[result appendString:@"\""];
else if ([scanner scanString:@"<" intoString:NULL])
[result appendString:@"<"];
else if ([scanner scanString:@">" intoString:NULL])
[result appendString:@">"];
else if ([scanner scanString:@"&#" intoString:NULL]) {
BOOL gotNumber;
unsigned charCode;
NSString *xForHex = @"";
// Is it hex or decimal?
if ([scanner scanString:@"x" intoString:&xForHex]) {
gotNumber = [scanner scanHexInt:&charCode];
}
else {
gotNumber = [scanner scanInt:(int*)&charCode];
}
if (gotNumber) {
[result appendFormat:@"%C", charCode];
}
else {
NSString *unknownEntity = @"";
[scanner scanUpToString:@";" intoString:&unknownEntity];
[result appendFormat:@"&#%@%@;", xForHex, unknownEntity];
NSLog(@"Expected numeric character entity but got &#%@%@;", xForHex, unknownEntity);
}
[scanner scanString:@";" intoString:NULL];
}
else {
NSString *unknownEntity = @"";
[scanner scanUpToString:@";" intoString:&unknownEntity];
NSString *semicolon = @"";
[scanner scanString:@";" intoString:&semicolon];
[result appendFormat:@"%@%@", unknownEntity, semicolon];
NSLog(@"Unsupported XML character entity %@%@", unknownEntity, semicolon);
}
}
while (![scanner isAtEnd]);
finish:
return result;
}
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
92520 次 |
最近记录: |