中文网站站长博客 [ZH-CN]: How to use robots.txt

中文网站站长博客。

Google中文网站站长博客为广大站长提供关于Google网页抓取、收录、搜索引擎优化以及其他相关的站长资讯。

How to use robots.txt

2008年3月31日星期一

By Chao Ma, In Hyuk Seok

A robots.txt provides restrictions to search engine robots (known as "bots") that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages. If you want to protect some of your contents from being indexed by search engines, robots.txt is a simple tool for it. In this time, we would like to discuss how to use it.

Placing Robots.txt

The "/robots.txt" file is a text file, with one or more records. The robots.txt file must be reside in the root of the domain and must be exactly named "robots.txt". A robots.txt file located in a subdirectory is not a valid, as bots only check for this file in the root of the domain.

For instance, http://www.example.com/robots.txt is a valid location. But, http://www.example.com/mysite/robots.txt is not.

Example of a robots.txt:

User-agent:*
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~name/

Block or remove your entire website using a robots.txt file

To remove your site from search engines and prevent all robots from crawling it in the future, place the following robots.txt file in your server root:

User-agent: *
Disallow: /

To remove your site from Google only and prevent just Googlebot from crawling your site in the future, place the following robots.txt file in your server root:

User-agent: Googlebot
Disallow: /

Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.

For your http protocol (http://yourserver.com/robots.txt):

User-agent: *
Allow: /

For the https protocol (https://yourserver.com/robots.txt):

User-agent: *
Disallow: /

Allow all robots complete access

User-agent: *
Disallow:

(alternative solution: Just create an empty "/robots.txt" file, or don't use one at all.)

Block or remove pages using a robots.txt file

You can use a robots.txt file to block Googlebot from crawling pages on your site.

For example, if you're manually creating a robots.txt file, to block Googlebot from crawling all pages under a particular directory (for example, private), you'd use the following robots.txt entry:

User-agent: Googlebot
Disallow: /private

To block Googlebot from crawling all files of a specific file type (for example, .gif), you'd use the following robots.txt entry:

User-agent: Googlebot
Disallow: /*.gif$

To block Googlebot from crawling any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):

User-agent: Googlebot
Disallow: /*?

While we won't crawl or index the content of pages blocked by robots.txt, we may still crawl and index the URLs if we find them on other pages on the web. As a result, the URL of the page or other publicly available information such as anchor text in links to the site can appear in Google search results. However, no content from your pages will be crawled, indexed, or displayed.

As a part of webmaster tool, Google provides robots.txt analysis tool. The tool reads the robots.txt file in the same way Googlebot does and gives you results for Google user-agents. We strongly suggest to use it . Before creating robots.txt, you should think about how much information you want to share with people, or to keep private. Remember that search engine is a good way to have your contents publicly more accessible. By using robots.txt properly, people will be happy to visit your website through search engine but meanwhile you can still prevent your private information from being exposed.

标签： robots.txt

标签

恶意软件
搜索引擎优化
网站管理员小贴士
网站管理员指南
小贴士
心系四川爱我中华
badware
Google
Google Webmaster Tools
Google索引
Google站长工具
img
Matt Cutts 的文章
robots.txt
SEO
Top Contributor

博客归档

2020
- 十一月
- 九月
- 八月
- 七月
- 六月
- 五月
- 四月
- 三月
- 二月
- 一月

2019
- 十一月
- 十月
- 九月

2018
- 七月
- 五月
- 二月
- 一月

2017
- 十二月
- 十一月
- 六月
- 四月
- 三月

2016
- 十二月
- 十一月
- 十月
- 九月
- 八月
- 五月
- 三月
- 一月

2015
- 十二月
- 十一月
- 十月
- 九月
- 八月
- 七月
- 五月
- 四月
- 三月
- 二月
- 一月

2014
- 十一月
- 九月
- 八月
- 七月
- 六月
- 五月
- 四月
- 三月
- 二月
- 一月

2013
- 十二月
- 十一月
- 九月
- 八月
- 七月
- 六月
- 五月
- 四月
- 三月
- 二月
- 一月

2012
- 十二月
- 十一月
- 十月
- 九月
- 八月
- 七月
- 六月
- 五月
- 四月
- 三月
- 二月
- 一月

2011
- 十二月
- 十一月
- 十月
- 九月
- 八月
- 七月
- 六月
- 五月
- 四月
- 三月
- 二月
- 一月

2010
- 十二月
- 十一月
- 十月
- 九月
- 八月
- 七月
- 六月
- 五月
- 四月
- 三月
- 二月
- 一月

2009
- 十二月
- 十一月
- 十月
- 九月
- 八月
- 七月
- 六月
- 五月
- 四月
- 三月
- 二月
- 一月

2008
- 十二月
- 十一月
- 十月
- 九月
- 八月
- 七月
- 六月
- 五月
- 四月
- 三月
- 二月
- 一月

2007
- 十二月
- 十一月
- 十月
- 九月
- 八月
- 七月

Feed

Give us feedback in our Product Forums.

Google
Privacy
Terms