python在html中显示unicode

psy*_*cat 5 python unicode json

我正在编写脚本来将我的链接及其标题从chrome导出到html.
存储为json的Chrome书签,采用utf编码
一些标题使用俄语,因此它们存储如下:
"name":"\ u0425\u0430\u0431\u0440\..."

import codecs
f = codecs.open("chrome.json","r", "utf-8")
data = f.readlines()

urls = [] # for links
names = [] # for link titles

ind = 0

for i in data:
    if i.find('"url":') != -1:
        urls.append(i.split('"')[3])
        names.append(data[ind-2].split('"')[3])
    ind += 1

fw = codecs.open("chrome.html","w","utf-8")
fw.write("<html><body>\n")
for n in names:
    fw.write(n + '<br>')
    # print type(n) # this will return <type 'unicode'> for each url!
fw.write("</body></html>")
Run Code Online (Sandbox Code Playgroud)

现在,在chrome.html中我把那些显示为\ u0425\u0430\u0431 ...
我怎么能把它们变回俄语?
使用python 2.5

**编辑:解决了!**

s = '\u041f\u0440\u0438\u0432\u0435\u0442 world!'
type(s)
<type 'str'>

print s.decode('raw-unicode-escape').encode('utf-8')
?????? world!
Run Code Online (Sandbox Code Playgroud)

这就是我需要的,将\ u041f ...的str转换为unicode.

f = open("chrome.json", "r")
data = f.readlines()
f.close()

urls = [] # for links
names = [] # for link titles

ind = 0

for i in data:
    if i.find('"url":') != -1:
        urls.append(i.split('"')[3])
        names.append(data[ind-2].split('"')[3])
    ind += 1

fw = open("chrome.html","w")
fw.write("<html><body>\n")
for n in names:
    fw.write(n.decode('raw-unicode-escape').encode('utf-8') + '<br>')
fw.write("</body></html>")
Run Code Online (Sandbox Code Playgroud)

Joh*_*hin 1

顺便说一句,这不仅仅是俄语;还有俄语。非 ASCII 字符在页面名称中很常见。例子:

name=u'Python Programming Language \u2013 Official Website'
url=u'http://www.python.org/'
Run Code Online (Sandbox Code Playgroud)

作为脆弱代码的替代方案,例如

urls.append(i.split('"')[3])
names.append(data[ind-2].split('"')[3])
# (1) relies on name being 2 lines before url
# (2) fails if there is a `"` in the name
# example: "name": "The \"Fubar\" website",
Run Code Online (Sandbox Code Playgroud)

您可以使用 json 模块处理输入文件。对于 Python 2.5,您可以获得simplejson

这是一个模拟您的脚本:

try:
    import json
except ImportError: 
    import simplejson as json
import sys

def convert_file(infname, outfname):

    def explore(folder_name, folder_info):
        for child_dict in folder_info['children']:
            ctype = child_dict.get('type')
            name = child_dict.get('name')
            if ctype == 'url':
                url = child_dict.get('url')
                # print "name=%r url=%r" % (name, url)
                fw.write(name.encode('utf-8') + '<br>\n')
            elif ctype == 'folder':
                explore(name, child_dict)
            else:
                print "*** Unexpected ctype=%r ***" % ctype

    f = open(infname, 'rb')
    bmarks = json.load(f)
    f.close()
    fw = open(outfname, 'w')
    fw.write("<html><body>\n")
    for folder_name, folder_info in bmarks['roots'].iteritems():
        explore(folder_name, folder_info)
    fw.write("</body></html>")
    fw.close()    

if __name__ == "__main__":
    convert_file(sys.argv[1], sys.argv[2])
Run Code Online (Sandbox Code Playgroud)

在 Windows 7 Pro 上使用 Python 2.5.4 进行测试。