下载HTML5 cache.manifest文件中列出的所有工件的最佳方法?

roc*_*wse 7 python html5 parsing

我试图看看HTML5应用程序是如何工作的,任何在webkit浏览器中保存页面的尝试(chrome,Safari)都包含一些但不是所有的cache.manifest资源.是否有一个库或一组代码将解析cache.manifest文件,并下载所有资源(图像,脚本,CSS)?

(原始代码移动回答... noob错误>.<)

roc*_*wse 0

我最初将其作为问题的一部分发布...(没有新手 stackoverflow 海报曾经这样做过;)

因为根本没有答案。干得好:

我能够想出以下 python 脚本来执行此操作,但任何输入都会受到赞赏 =) (这是我第一次尝试 python 代码,所以可能有更好的方法)

import os
import urllib2
import urllib

cmServerURL = 'http://<serverURL>:<port>/<path-to-cache.manifest>'

# download file code taken from stackoverflow
# http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python
def loadURL(url, dirToSave):
        file_name = url.split('/')[-1]
        u = urllib2.urlopen(url)
        f = open(dirToSave, 'wb')
        meta = u.info()
        file_size = int(meta.getheaders("Content-Length")[0])
        print "Downloading: %s Bytes: %s" % (file_name, file_size)

        file_size_dl = 0
        block_sz = 8192
        while True:
                buffer = u.read(block_sz)
                if not buffer:
                        break

                file_size_dl += len(buffer)
                f.write(buffer)
                status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
                status = status + chr(8)*(len(status)+1)
                print status,

        f.close()

# download the cache.manifest file
# since this request doesn't include the Conent-Length header we will use a different api =P
urllib.urlretrieve (cmServerURL+ 'cache.manifest', './cache.manifest')

# open the cache.manifest and go through line-by-line checking for the existance of files
f = open('cache.manifest', 'r')
for line in f:
        filepath = line.split('/')
        if len(filepath) > 1:
                fileName = line.strip()
                # if the file doesn't exist, lets download it
                if not os.path.exists(fileName):
                                print 'NOT FOUND: ' + line
                                dirName = os.path.dirname(fileName)
                                print 'checking dirctory: ' + dirName
                                if not os.path.exists(dirName):
                                        os.makedirs(dirName)
                                else:
                                        print 'directory exists'
                                print 'downloading file: ' + cmServerURL + line,
                                loadURL (cmServerURL+fileName, fileName)
Run Code Online (Sandbox Code Playgroud)