如何阻止搜索引擎索引以origin.domainname.com开头的所有网址

Question

如何阻止搜索引擎索引以origin.domainname.com开头的所有网址

Lov*_*aur 5 .htaccess robots.txt url-rewriting

我有www.domainname.com,origin.domainname.com指向相同的代码库.有没有办法,我可以阻止basename origin.domainname.com的所有网址被编入索引.

在robot.txt中是否有一些规则可以做到这一点.两个网址都指向同一个文件夹.此外,我尝试将origin.domainname.com重定向到htaccess文件中的www.domainname.com,但它似乎没有工作..

如果有任何类似问题的人可以提供帮助,我将不胜感激.

谢谢

Answer 1

Lek*_*eyn 13

您可以重写robots.txt另一个文件(让我们将这个'robots_no.txt'命名为包含:

User-Agent: *
Disallow: /

Run Code Online (Sandbox Code Playgroud)

(来源:http://www.robotstxt.org/robotstxt.html)

.htaccess文件如下所示:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.example.com$
RewriteRule ^robots.txt$ robots_no.txt

Run Code Online (Sandbox Code Playgroud)

为每个(子)域使用自定义robots.txt:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^sub.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^example.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.example.org$ [OR]
RewriteCond %{HTTP_HOST} ^example.org$
# Rewrites the above (sub)domains <domain> to robots_<domain>.txt
# example.org -> robots_example.org.txt
RewriteRule ^robots.txt$ robots_${HTTP_HOST}.txt [L]
# in all other cases, use default 'robots.txt'
RewriteRule ^robots.txt$ - [L]

Run Code Online (Sandbox Code Playgroud)

www.example.com您也可以使用,而不是要求搜索引擎阻止其他页面上的所有页面<link rel="canonical">.

如果http://example.com/page.html与http://example.org/~example/page.html这两个点http://www.example.com/page.html,把下一个标签在<head>:

<link rel="canonical" href="http://www.example.com/page.html">

Run Code Online (Sandbox Code Playgroud)

另见Googles关于rel ="canonical"的文章

归档时间：	15 年，1 月前
查看次数：	15272 次
最近记录：	6 年，8 月前