Google App Engine Python 2.7 + lxml = Unicode ParserError

zzz*_*zzz 6 unicode google-app-engine lxml python-2.7

我正在尝试使用BeautifulSoup v4来解析文档.我在note.content上调用BeautifulSoup,这是Evernote的API返回的字符串:

汤= BeautifulSoup(note.content)

我在app.yaml文件中启用了lxml:

libraries:
- name: lxml
  version: "2.3"
Run Code Online (Sandbox Code Playgroud)

请注意,这适用于我的本地开发服务器.但是,当部署到Google的云时,我收到以下错误:

错误跟踪:

Unicode parsing is not supported on this platform
Traceback (most recent call last):
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~ever-blog/1.356951374446096208/controller/blog.py", line 101, in get
    soup = BeautifulSoup(note.content)
  File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/__init__.py", line 168, in __init__
    self._feed()
  File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/__init__.py", line 181, in _feed
    self.builder.feed(self.markup)
  File "/base/data/home/apps/s~ever-blog/1.356951374446096208/lib/bs4/builder/_lxml.py", line 62, in feed
    self.parser.feed(markup)
  File "parser.pxi", line 1077, in lxml.etree._FeedParser.feed (third_party/apphosting/python/lxml/src/lxml/lxml.etree.c:76196)
ParserError: Unicode parsing is not supported on this platform
Run Code Online (Sandbox Code Playgroud)

更新:

我检查了parser.pxi,我发现这些代码行产生了错误:

elif python.PyUnicode_Check(data):
            if _UNICODE_ENCODING is NULL:
                raise ParserError, \
                    u"Unicode parsing is not supported on this platform"
Run Code Online (Sandbox Code Playgroud)

我认为GAE的部署环境必定存在导致此错误的问题,但我不确定是什么.

更新2:

因为BeautifulSoup会自动回退到其他解析器上,所以我最终完全从我的应用程序中删除了lxml.这样做可以解决问题.

小智 1

尝试解析 utf-8 字符串而不是 unicode。