目标C使用NSScanner从html获取<body>

Har*_*vey 3 html objective-c ios nsscanner

我正在尝试创建一个iOS应用程序,只是为了提取网页的一部分.

我有代码工作连接到URL并将HTML存储在NSString中

我试过这个,但我只是为我的结果得到空字符串

    NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
    // Create a new scanner and give it the html data to parse.

    while (![newScanner isAtEnd])
    {
        [newScanner scanUpToString:@"<body>" intoString:NULL];
        // Scam until <body> tag is found

        [newScanner scanUpToString:@"</body>" intoString:&bodyText];
        // Everything up to the end tag will get placed into the memory address of the result string

    }
Run Code Online (Sandbox Code Playgroud)

我尝试了另一种方法......

    NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
    // Create a new scanner and give it the html data to parse.

    while (![newScanner isAtEnd])
    {
        [newScanner scanUpToString:@"<body" intoString:NULL];
        // Scam until <body> tag is found

        [newScanner scanUpToString:@">" intoString:NULL];
        // Go to end of opening <body> tag

        [newScanner scanUpToString:@"</body>" intoString:&bodyText];
        // Everything up to the end tag will get placed into the memory address of the result string

    }
Run Code Online (Sandbox Code Playgroud)

第二种方式返回一个以>< script...etc 开头的字符串

如果我诚实,我没有一个很好的URL来测试这个,我认为这可能更容易一些帮助删除体内的标签(如<p></p>)

任何帮助都会非常受欢迎

rde*_*mar 5

我不知道为什么你的第一种方法不起作用.我假设您在该片段之前定义了bodyText.这段代码对我很好,

- (void)viewDidLoad {
    [super viewDidLoad];
    NSString *htmlData = @"This is some stuff before <body> this is the body </body> with some more stuff";
    NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
    NSString *bodyText;
    while (![newScanner isAtEnd]) {
        [newScanner scanUpToString:@"<body>" intoString:NULL];
        [newScanner scanString:@"<body>" intoString:NULL];
        [newScanner scanUpToString:@"</body>" intoString:&bodyText];
    }
    NSLog(@"%@",bodyText); // 2015-01-28 15:58:00.360 ScanningOfHTMLProblem[1373:661934] this is the body 
}
Run Code Online (Sandbox Code Playgroud)

请注意,我添加了一个调用来scanString:intoString:超越第一个"<body>".