Nokogiri在段落中找到文本

Question

Nokogiri在段落中找到文本

我想替换我的XHTML文档中所有段落中的inner_text.

我知道我可以像这样得到Nokogiri的所有文字

doc.xpath("//text()")

Run Code Online (Sandbox Code Playgroud)

但是我只希望对段落中的文本进行操作,如何在不影响链接中最终存在的锚文本的情况下选择段落中的所有文本？

#For example : <p>some text <a href="/">This should not be changed</a> another one</p>

Run Code Online (Sandbox Code Playgroud)

Answer 1

jee*_*eem 6

对于作为段落的直接子项的文本,使用// p/text()

irb> h = '<p>some text <a href="/">This should not be changed</a> another one</p>'
=> ...
irb> doc = Nokogiri::HTML(h)
=> ...
irb> doc.xpath '//p/text()'
=> [#<Nokogiri::XML::Text:0x80ac2e04 "some text ">, #<Nokogiri::XML::Text:0x80ac26c0 " another one">]

Run Code Online (Sandbox Code Playgroud)

对于段落的后代(即时或非直接)的文本,使用// p // text().要排除那些将锚作为父级的文本,您可以将它们减去.

irb> doc.xpath('//p//text()') - doc.xpath('//p//a/text()')
=> [#<Nokogiri::XML::Text:0x80ac2e04 "some text ">, #<Nokogiri::XML::Text:0x80ac26c0 " another one">]

Run Code Online (Sandbox Code Playgroud)

可能有一种方法可以通过一次调用来完成,但我的xpath知识并没有那么深入.

归档时间：	15 年，10 月前
查看次数：	3667 次
最近记录：	15 年，10 月前