当我使用scrapy shell对url进行请求时,我得到如下信息:
In [6]: sel.xpath("//div[@class='my_class']").extract()
[u'<div class="my_class"><ul><li class="parent">\n<a href="/category/tractors-ride-on-mowers/">\n\u0422\u0420\u0410\u041a\u0422\u041e\u0420\u042b \u0438 \u0420\u0410\u0419\u0414\u0415\u0420\u042b</a>\n<div class="sub1"><div class="str"></div><ul><li><a href="/category/lawn-tractors/" class="">\u0421\u0430\u0434\u043e\u0432\u044b\u0435 \u0442\u0440\u0430\u043a\u0442\u043e\u0440\u04....
Run Code Online (Sandbox Code Playgroud)
如何将其转换为可读的字符串?
一旦您将其打印(或将其写入文件),它就会可读
>>> u = u'<div class="my_class"><ul><li class="parent">\n<a href="/category/tractors-ride-on-mowers/">\n\u0422\u0420\u0410\u041a\u0422\u041e\u0420\u042b \u0438 \u0420\u0410\u0419\u0414\u0415\u0420\u042b</a>\n<div class="sub1"><div class="str"></div><ul><li><a href="/category/lawn-tractors/" class="">\u0421\u0430\u0434\u043e\u0432\u044b\u0435 \u0442\u0440\u0430\u043a\u0442\u043e\u0440'
>>> print (u)
<div class="my_class"><ul><li class="parent">
<a href="/category/tractors-ride-on-mowers/">
???????? ? ???????</a>
<div class="sub1"><div class="str"></div><ul><li><a href="/category/lawn-tractors/" class="">??????? ???????
>>>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2764 次 |
| 最近记录: |