我有一个网址,如:
http://www.relevantmagazine.com/life/relationship/blog/23317-pursuing-singleness
Run Code Online (Sandbox Code Playgroud)
并希望从中提取相关的杂志.
目前我有:
@urlroot = URI.parse(@link.url).host
Run Code Online (Sandbox Code Playgroud)
但它返回www.relevantmagazine.com 任何人都可以帮助我吗?
使用宝石可能有点矫枉过正,但无论如何:有一个名为domainatrix的方便宝石,它可以为你处理像两个元素顶级域等的东西时提取网站名称.
url = Domainatrix.parse("http://www.pauldix.net")
url.url # => "http://www.pauldix.net" (the original url)
url.public_suffix # => "net"
url.domain # => "pauldix"
url.canonical # => "net.pauldix"
url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"
Run Code Online (Sandbox Code Playgroud)