小编dom*_*omi的帖子

使用lxml解析RSS时出现编码错误

我想用lxml解析下载的RSS,但我不知道如何处理UnicodeDecodeError?

request = urllib2.Request('http://wiadomosci.onet.pl/kraj/rss.xml')
response = urllib2.urlopen(request)
response = response.read()
encd = chardet.detect(response)['encoding']
parser = etree.XMLParser(ns_clean=True,recover=True,encoding=encd)
tree = etree.parse(response, parser)
Run Code Online (Sandbox Code Playgroud)

但是我收到一个错误:

tree   = etree.parse(response, parser)
File "lxml.etree.pyx", line 2692, in lxml.etree.parse (src/lxml/lxml.etree.c:49594)
  File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71364)
  File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:71647)
  File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:70742)
  File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:67
740)
  File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etr
ee.c:63824)
  File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64745)
  File "parser.pxi", line 559, …
Run Code Online (Sandbox Code Playgroud)

python rss lxml chardet scraperwiki

9
推荐指数
2
解决办法
7598
查看次数

从div中选定的范围/文本中删除样式

请考虑以下代码段:

<div>
     <span style="color:red;">a</span>
     <span style="color:blue;">a</span>
     <span style="color:white;">a</span>
</div>
Run Code Online (Sandbox Code Playgroud)

如何从用户文本中删除样式?


编辑:添加OP的说明:

谢谢您的回答!我必须更精确.对不起."用户文本选择"是什么意思:用鼠标选中/突出显示.我有很多内部跨度的div(就像它在下面 - 没有额外的id,跨越的类:/).

[...]
<div>
 <span style="color:red;">a</span>
 <span style="color:blue;">b</span>
 <span style="color:white;">c</span>
</div>
<div>
 <span style="color:red;">d</span>
 <span style="color:blue;">a</span>
 <span style="color:white;">a</span>
</div>
[...]
Run Code Online (Sandbox Code Playgroud)

我想要实现的目标:用户选择鼠标"ab",单击按钮(输入类型=按钮),从选定的跨度/跨度中删除样式.像TinyMCE一样的类似行为.

javascript jquery

3
推荐指数
1
解决办法
8865
查看次数

标签 统计

chardet ×1

javascript ×1

jquery ×1

lxml ×1

python ×1

rss ×1

scraperwiki ×1