lfa*_*lin 105 iphone cocoa-touch objective-c nsstring ios
有几个不同的方法来去除HTML tags从NSString在Cocoa.
一种方法是将字符串渲染为a NSAttributedString然后抓取渲染的文本.
另一种方法是使用NSXMLDocument's- objectByApplyingXSLTString方法来应用XSLT执行它的变换.
不幸的是,iPhone不支持NSAttributedString或NSXMLDocument.有太多的边缘情况和格式错误的HTML文档让我觉得使用正则表达式或NSScanner.有人有解决方案吗?
一个建议是简单地查找开始和结束标记字符,除非非常简单的情况,否则此方法不起作用.
例如,这些案例(来自同一主题的Perl Cookbook章节)会打破这种方法:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
Run Code Online (Sandbox Code Playgroud)
m.k*_*ski 307
一个快速和"脏"(删除<和>之间的所有内容)解决方案,适用于iOS> = 3.2:
-(NSString *) stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}
Run Code Online (Sandbox Code Playgroud)
我将此声明为os NSString类别.
Lei*_*och 29
此NSString类别使用NSXMLParser从中准确删除任何HTML标签NSString.这是一个.m和.h可以很容易地纳入您的项目文件.
https://gist.github.com/leighmcculloch/1202238
然后html通过执行以下操作进行剥离:
导入标题:
#import "NSString_stripHtml.h"
Run Code Online (Sandbox Code Playgroud)
然后调用stripHtml:
NSString* mystring = @"<b>Hello</b> World!!";
NSString* stripped = [mystring stripHtml];
// stripped will be = Hello World!!
Run Code Online (Sandbox Code Playgroud)
这也适用于HTML技术上没有的畸形XML.
MAN*_*ORE 10
UITextView *textview= [[UITextView alloc]initWithFrame:CGRectMake(10, 130, 250, 170)];
NSString *str = @"This is <font color='red'>simple</font>";
[textview setValue:str forKey:@"contentToHTMLString"];
textview.textAlignment = NSTextAlignmentLeft;
textview.editable = NO;
textview.font = [UIFont fontWithName:@"vardana" size:20.0];
[UIView addSubview:textview];
Run Code Online (Sandbox Code Playgroud)
对我来说很好
用这个
NSString *myregex = @"<[^>]*>"; //regex to remove any html tag
NSString *htmlString = @"<html>bla bla</html>";
NSString *stringWithoutHTML = [hstmString stringByReplacingOccurrencesOfRegex:myregex withString:@""];
Run Code Online (Sandbox Code Playgroud)
不要忘记将其包含在您的代码中:#import"RegexKitLite.h"这里是下载此API的链接:http://regexkit.sourceforge.net/#Downloads
你可以使用如下
-(void)myMethod
{
NSString* htmlStr = @"<some>html</string>";
NSString* strWithoutFormatting = [self stringByStrippingHTML:htmlStr];
}
-(NSString *)stringByStrippingHTML:(NSString*)str
{
NSRange r;
while ((r = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
{
str = [str stringByReplacingCharactersInRange:r withString:@""];
}
return str;
}
Run Code Online (Sandbox Code Playgroud)
这是一个比接受的答案更有效的解决方案:
- (NSString*)hp_stringByRemovingTags
{
static NSRegularExpression *regex = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
regex = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
});
// Use reverse enumerator to delete characters without affecting indexes
NSArray *matches =[regex matchesInString:self options:kNilOptions range:NSMakeRange(0, self.length)];
NSEnumerator *enumerator = matches.reverseObjectEnumerator;
NSTextCheckingResult *match = nil;
NSMutableString *modifiedString = self.mutableCopy;
while ((match = [enumerator nextObject]))
{
[modifiedString deleteCharactersInRange:match.range];
}
return modifiedString;
}
Run Code Online (Sandbox Code Playgroud)
上面的NSString类别使用正则表达式来查找所有匹配的标记,创建原始字符串的副本,最后通过以相反的顺序迭代它们来删除所有标记.它效率更高,因为:
这对我来说表现不错,但使用的解决方案NSScanner可能更有效.
与接受的答案一样,此解决方案并未解决@lfalin请求的所有边界情况.这些将需要更昂贵的解析,平均用例很可能不需要.
没有循环(至少在我们这边):
- (NSString *)removeHTML {
static NSRegularExpression *regexp;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
regexp = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
});
return [regexp stringByReplacingMatchesInString:self
options:kNilOptions
range:NSMakeRange(0, self.length)
withTemplate:@""];
}
Run Code Online (Sandbox Code Playgroud)
小智 5
NSAttributedString *str=[[NSAttributedString alloc] initWithData:[trimmedString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];
Run Code Online (Sandbox Code Playgroud)