获取robots.txt以阻止在"？"之后访问网站上的网址字符,但索引页本身

Question

获取robots.txt以阻止在"？"之后访问网站上的网址字符,但索引页本身

我有一个小的magento站点,其中包含页面URL,例如:

http://www.mysite.com/contact-us.html
http://www.mysite.com/customer/account/login/

Run Code Online (Sandbox Code Playgroud)

但是我也有包含过滤器(例如价格和颜色)的页面,一旦这样的例子是:

http://www.mysite.com/products.html?price=1%2C1000

Run Code Online (Sandbox Code Playgroud)

要么

http://www.mysite.com/products/chairs.html?price=1%2C1000

Run Code Online (Sandbox Code Playgroud)

问题是,当谷歌机器人和其他搜索引擎机器人搜索网站时,它基本上停止了,因为它们陷入了所有"过滤器链接".

所以,在robots.txt文件中如何配置,例如:

用户代理:*

允许:

不允许:

允许所有页面如:

http://www.mysite.com/contact-us.html
http://www.mysite.com/customer/account/login/

Run Code Online (Sandbox Code Playgroud)

获得索引,但在的情况下,http://www.mysite.com/products/chairs.html?price=1%2C1000指数products.html但之后忽略所有内容的"？" 同为
http://www.mysite.com/products/chairs.html?price=1%2C1000

我也不想指定每个页面,反过来只是一个规则来忽略?主页面之后的所有内容.

Answer 1

Jim*_*hel 9

我想这会做到:

User-Agent: *
Disallow: /*?

Run Code Online (Sandbox Code Playgroud)

这将禁止任何包含问号的网址.

如果你想禁止那些拥有的?price,你会写:

Disallow: /*?price

Run Code Online (Sandbox Code Playgroud)

查看相关问题(右侧列表),例如:

限制(特定)查询字符串(参数)值的机器人访问？

如何禁止robots.txt中的搜索页面

补充说明:

语法Disallow: /*?说,"禁止任何带有问号的网址." 这/是url的路径和查询部分的开始.因此,如果您的网址是http://mysite.com/products/chairs.html?manufacturer=128&usage=165,则路径和查询部分是/products/chairs.html?manufacturer=128&usage=165.该*说"匹配任何字符".所以Disallow: /*?会匹配/<anything>?<more stuff>- 任何有问号的东西.

归档时间：	14 年，5 月前
查看次数：	4430 次
最近记录：	11 年，7 月前

获取robots.txt以阻止在"？"之后访问网站上的网址 字符,但索引页本身

获取robots.txt以阻止在"？"之后访问网站上的网址字符,但索引页本身