通过rails中的链接获取标题,内容

Question

通过rails中的链接获取标题,内容

gko*_*lan 3 ruby parsing ruby-on-rails web-scraping ruby-on-rails-3

我刚开始学习rails.你能帮我理解解析一个链接吗？好的教程也会有所帮助......

问题:

当您在Digg,Facebook等中提交链接时.在您说附加链接后,它会解析链接以获取特定网址的标题,内容和图像.你能帮我解决一下在rails中如何实现类似的东西吗？

我看过饲料解析器,如feedzirra等,但他们似乎得到了完整的网站提供..不仅仅是我们正在寻找的链接..还是我在某处犯了错误？

非常感谢提前.

Answer 1

oot*_*vak 6

看起来你可能正在寻找像Pismo这样的东西:https://github.com/peterc/pismo

require 'pismo'

# Load a Web page (you could pass an IO object or a string with existing HTML data along, as you prefer)
doc = Pismo::Document.new('http://www.rubyinside.com/cramp-asychronous-event-driven-ruby-web-app-framework-2928.html')

doc.title     # => "Cramp: Asychronous Event-Driven Ruby Web App Framework"
doc.author    # => "Peter Cooper"
doc.lede      # => "Cramp (GitHub repo) is a new, asynchronous evented Web app framework by Pratik Naik of 37signals (and the Rails core team). It's built around Ruby's EventMachine library and was designed to use event-driven I/O throughout - making it ideal for situations where you need to handle a large number of open connections (such as Comet systems or streaming APIs.)"
doc.keywords  # => [["cramp", 7], ["controllers", 3], ["app", 3], ["basic", 2], ..., ... ]

Run Code Online (Sandbox Code Playgroud)

图像警告是:

图像提取仅处理具有绝对URL的图像

归档时间：	14 年，5 月前
查看次数：	3248 次
最近记录：	13 年，2 月前