使用NSInputStream流式传输NSXMLParser

Rob*_*Rob 9 nsxmlparser nsinputstream ios

更新:

使用NSXMLParser类方法时initWithContentsOfURL,不是在下载XML提要时进行解析,而是尝试将整个XML文件加载到内存中,然后才启动解析过程.如果XML提要量很大(使用过多的RAM,本身效率低下,因为它不是与下载并行解析,而是仅在下载完成后才开始解析等),这是有问题的.

有没有人发现如何使用源将数据流传输到设备来解析NSXMLParser?是的,您可以使用LibXML2(如下所述),但似乎应该可以使用它NSXMLParser.但这是在逃避我.

原始问题:

我正在努力使用NSXMLParser从Web流中读取XML.如果您使用initWithContentsOfURL,虽然界面可能会导致人们推断它会从Web流式传输XML,但它似乎不会这样做,而是在尝试进行任何解析之前尝试先加载整个XML文件.对于适度大小的XML文件来说这很好,但是对于非常大的XML文件来说,这是有问题的.

我已经看到NSXMLParserinitWithStreamNSInputStream从网络流式传输的一些定制使用相关的讨论.例如,有一些答案建议使用类似下面的Cocoa Builder帖子中CFStreamCreateBoundPair引用的内容和Apple Stream Programming Guide中关于设置套接字流的讨论,但我还没有使用它.我甚至尝试写我自己的子类是使用(这是它本身,在流还不错),但我无法得到它与一起工作.NSInputStreamNSURLConnectionNSXMLParser

最后,我决定使用LibXML2而不是NSXMLParser,如Apple XMLPerformance示例中所示,但我想知道是否有人有幸从使用的网络源获取流媒体NSXMLParser.我见过很多"理论上你可以做X "之类的答案,这一切从CFStreamCreateBoundPair给抓住了HTTPBodyStreamNSURLRequest,但我还没有遇到与流的工作演示NSXMLParser.

Ray Wenderlich文章如何为您的iPhone项目选择最佳的XML解析器似乎证实它NSXMLParser不适合大型XML文件,但是所有帖子都有关于NSXMLParser基于可能的解决方案的流量来传输非常大的XML文件,我我很惊讶我还没有找到这方面的工作示范.有谁知道NSXMLParser从网络流出的功能实现?显然,我可以坚持使用LibXML2或者其他一些等效的XML解析器,但是流式传输的概念NSXMLParser似乎非常接近.

bda*_*ash 5

-[NSXMLParser initWithStream:]NSXMLParser当前执行数据流式解析的唯一接口.将它连接到异步NSURLConnection提供数据的异步是不实用的,因为NSXMLParser采用阻塞,"拉"的方法来读取数据NSInputStream.也就是说,-[NSXMLParser parse]在处理时遇到以下情况NSInputStream:

while (1) {
    NSInteger length = [stream read:buffer maxLength:maxLength];
    if (!length)
        break;

    // Parse data …
}
Run Code Online (Sandbox Code Playgroud)

为了向该解析器递增地提供数据,NSInputStream需要一个自定义子类,它将NSURLConnectionDelegate后台队列或runloop上的-read:maxLength:调用所接收的数据汇集到NSXMLParser正在等待的调用上.

概念验证实现如下:

#include <Foundation/Foundation.h>

@interface ReceivedDataStream : NSInputStream <NSURLConnectionDelegate>
@property (retain) NSURLConnection *connection;
@property (retain) NSMutableArray *bufferedData;
@property (assign, getter=isFinished) BOOL finished;
@property (retain) dispatch_semaphore_t semaphore;
@end

@implementation ReceivedDataStream

- (id)initWithContentsOfURL:(NSURL *)url
{
    if (!(self = [super init]))
        return nil;

    NSURLRequest *request = [NSURLRequest requestWithURL:url];
    self.connection = [[[NSURLConnection alloc] initWithRequest:request delegate:self startImmediately:NO] autorelease];
    self.connection.delegateQueue = [[[NSOperationQueue alloc] init] autorelease];
    self.bufferedData = [NSMutableArray array];
    self.semaphore = dispatch_semaphore_create(0);

    return self;
}

- (void)dealloc
{
    self.connection = nil;
    self.bufferedData = nil;
    self.semaphore = nil;

    [super dealloc];
}

- (BOOL)hasBufferedData
{
    @synchronized (self) { return self.bufferedData.count > 0; }
}

#pragma mark - NSInputStream overrides

- (void)open
{
    NSLog(@"open");
    [self.connection start];
}

- (void)close
{
    NSLog(@"close");
    [self.connection cancel];
}

- (NSInteger)read:(uint8_t *)buffer maxLength:(NSUInteger)maxLength
{
    NSLog(@"read:%p maxLength:%ld", buffer, maxLength);
    if (self.isFinished && !self.hasBufferedData)
        return 0;

    if (!self.hasBufferedData)
        dispatch_semaphore_wait(self.semaphore, DISPATCH_TIME_FOREVER);

    NSAssert(self.isFinished || self.hasBufferedData, @"Was woken without new information");

    if (self.isFinished && !self.hasBufferedData)
        return 0;

    NSData *data = nil;
    @synchronized (self) {
        data = [[self.bufferedData[0] retain] autorelease];
        [self.bufferedData removeObjectAtIndex:0];
        if (data.length > maxLength) {
            NSData *remainingData = [NSData dataWithBytes:data.bytes + maxLength length:data.length - maxLength];
            [self.bufferedData insertObject:remainingData atIndex:0];
        }
    }

    NSUInteger copiedLength = MIN([data length], maxLength);
    memcpy(buffer, [data bytes], copiedLength);
    return copiedLength;
}


#pragma mark - NSURLConnetionDelegate methods

- (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data
{
    NSLog(@"connection:%@ didReceiveData:…", connection);
    @synchronized (self) {
        [self.bufferedData addObject:data];
    }
    dispatch_semaphore_signal(self.semaphore);
}

- (void)connectionDidFinishLoading:(NSURLConnection *)connection
{
    NSLog(@"connectionDidFinishLoading:%@", connection);
    self.finished = YES;
    dispatch_semaphore_signal(self.semaphore);
}

@end

@interface ParserDelegate : NSObject <NSXMLParserDelegate>
@end

@implementation ParserDelegate

- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
    NSLog(@"parser:%@ didStartElement:%@ namespaceURI:%@ qualifiedName:%@ attributes:%@", parser, elementName, namespaceURI, qualifiedName, attributeDict);
}

- (void)parserDidEndDocument:(NSXMLParser *)parser
{
    NSLog(@"parserDidEndDocument:%@", parser);
    CFRunLoopStop(CFRunLoopGetCurrent());
}

@end


int main(int argc, char **argv)
{
    @autoreleasepool {

        NSURL *url = [NSURL URLWithString:@"http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xml"];
        ReceivedDataStream *stream = [[ReceivedDataStream alloc] initWithContentsOfURL:url];
        NSXMLParser *parser = [[NSXMLParser alloc] initWithStream:stream];
        parser.delegate = [[[ParserDelegate alloc] init] autorelease];

        [parser performSelector:@selector(parse) withObject:nil afterDelay:0.0];

        CFRunLoopRun();

    }
    return 0;
}
Run Code Online (Sandbox Code Playgroud)