Rails:带有erb的动态robots.txt

imd*_*rek 15 ruby-on-rails

我正在尝试在我的Rails(3.0.10)应用程序中呈现动态文本文件(robots.txt),但它继续将其呈现为HTML(称为控制台).

match 'robots.txt' => 'sites#robots'
Run Code Online (Sandbox Code Playgroud)

控制器:

class SitesController < ApplicationController

  respond_to :html, :js, :xml, :css, :txt

  def robots
    @site = Site.find_by_subdomain # blah blah
  end

end
Run Code Online (Sandbox Code Playgroud)

应用程序/视图/网站/ robots.txt.erb:

Sitemap: <%= @site.url %>/sitemap.xml
Run Code Online (Sandbox Code Playgroud)

但是当我访问时,http://www.example.com/robots.txt我得到一个空白页面/来源,日志说:

Started GET "/robots.txt" for 127.0.0.1 at 2011-11-21 11:22:13 -0500
  Processing by SitesController#robots as HTML
  Site Load (0.4ms)  SELECT `sites`.* FROM `sites` WHERE (`sites`.`subdomain` = 'blah') ORDER BY created_at DESC LIMIT 1
Completed 406 Not Acceptable in 828ms
Run Code Online (Sandbox Code Playgroud)

知道我做错了什么吗?

注意:我将此添加到config/initializers/mime_types,因为Rails抱怨不知道.txt mime类型是什么:

Mime::Type.register_alias "text/plain", :txt
Run Code Online (Sandbox Code Playgroud)

注意2:我确实从公共目录中删除了stock robots.txt.

Tho*_*emm 11

注意:这是来自coderwall的转贴.

阅读Stackoverflow上类似答案的一些建议,我目前使用以下解决方案根据请求的主机参数呈现动态robots.txt.

路由

# config/routes.rb
#
# Dynamic robots.txt
get 'robots.:format' => 'robots#index'
Run Code Online (Sandbox Code Playgroud)

调节器

# app/controllers/robots_controller.rb
class RobotsController < ApplicationController
  # No layout
  layout false

  # Render a robots.txt file based on whether the request
  # is performed against a canonical url or not
  # Prevent robots from indexing content served via a CDN twice
  def index
    if canonical_host?
      render 'allow'
    else
      render 'disallow'
    end
  end

  private

  def canonical_host?
    request.host =~ /plugingeek\.com/
  end
end
Run Code Online (Sandbox Code Playgroud)

查看

基于request.host我们渲染两个不同.text.erb视图文件之一.

允许机器人

# app/views/robots/allow.text.erb # Note the .text extension

# Allow robots to index the entire site except some specified routes
# rendered when site is visited with the default hostname
# http://www.robotstxt.org/

# ALLOW ROBOTS
User-agent: *
Disallow:
Run Code Online (Sandbox Code Playgroud)

禁止蜘蛛

# app/views/robots/disallow.text.erb # Note the .text extension

# Disallow robots to index any page on the site
# rendered when robot is visiting the site
# via the Cloudfront CDN URL
# to prevent duplicate indexing
# and search results referencing the Cloudfront URL

# DISALLOW ROBOTS
User-agent: *
Disallow: /
Run Code Online (Sandbox Code Playgroud)

眼镜

使用RSpec和Capybara测试设置也很容易.

# spec/features/robots_spec.rb
require 'spec_helper'

feature "Robots" do
  context "canonical host" do
    scenario "allow robots to index the site" do
      Capybara.app_host = 'http://www.plugingeek.com'
      visit '/robots.txt'
      Capybara.app_host = nil

      expect(page).to have_content('# ALLOW ROBOTS')
      expect(page).to have_content('User-agent: *')
      expect(page).to have_content('Disallow:')
      expect(page).to have_no_content('Disallow: /')
    end
  end

  context "non-canonical host" do
    scenario "deny robots to index the site" do
      visit '/robots.txt'

      expect(page).to have_content('# DISALLOW ROBOTS')
      expect(page).to have_content('User-agent: *')
      expect(page).to have_content('Disallow: /')
    end
  end
end

# This would be the resulting docs
# Robots
#   canonical host
#      allow robots to index the site
#   non-canonical host
#      deny robots to index the site
Run Code Online (Sandbox Code Playgroud)

作为最后一步,您可能需要删除public/robots.txt公用文件夹中的静态(如果它仍然存在).

希望这个对你有帮助.随意发表评论,帮助进一步改进这项技术.


Nat*_*han 7

在Rails 3.2.3中工作的一个解决方案(不确定3.0.10)如下:

1)为模板文件命名robots.text.erb#Emphasis on textvs.txt

2)像这样设置您的路线: match '/robots.:format' => 'sites#robots'

3)保持原样(您可以删除控制器中的respond_with)

def robots
  @site = Site.find_by_subdomain # blah blah
end
Run Code Online (Sandbox Code Playgroud)

此解决方案还消除了在接受的答案中提到txt.erbrender调用中明确指定的需要.


Ale*_*tie 1

我认为问题是,如果您respond_to在控制器中定义,则必须respond_with在操作中使用:

def robots
  @site = Site.find_by_subdomain # blah blah
  respond_with @site
end
Run Code Online (Sandbox Code Playgroud)

另外,尝试显式指定.erb要渲染的文件:

def robots
  @site = Site.find_by_subdomain # blah blah
  render 'sites/robots.txt.erb'
  respond_with @site
end
Run Code Online (Sandbox Code Playgroud)