domain .tw idv.tw .com.tw
你正在閱覽 :: 繁體中文:: 學習 ASP SQL VB HTML code

學習 .ASP SQL HTML XML CSS JAVA Perl code study
English Chinese_Traditional Chinese_Simplified

你正在閱覽 :: 繁體中文 Chinese_Traditional

學習.主題:
首頁
ASP
CSS
Ecommerce
HTML
InternetProtocol
JAVA
Microsoft
php
Robots
. href
. META
. Robots.txt
»Disallow
SearchEngine
SQL
SQLserver
VBscript
XML

SiteMap

3w....學習.教學 > ROBOTS > robots.txt » disallow

3w learning

robots.txt >
disallow

Title:

Disallow: / disallow robots to read and search all file after the first "/" often say every thing after your domain or after your IP Disallow: It is empty after "Disallow:" then it means allow ---------- attention ----------- not all the r


阻止搜索引擎漫遊器讀取網站第一個"/"
之後的所有網頁或特定目錄
Description:

Robots.txt shall be added at root, no other places ex: http://----.com/robots.txt User-Agent:* Disallow: / ( disallow all robots to search all files) ------------- attention ----------- if you want allow all search engines then suggest you : way 1: To delete robots.txt because you do not need a robots.txt way 2: To keep robots.txt BLANK so no errors fill in error log when bots come. Way 3: no robots.txt but create a customized error page Do not recommend you to use User-Agent:* Disallow: to allow all robots to read your files Because not all robots follow the same rules the better way to allow all robots to read your all sites which is Way 1: " to delete your robots.txt " But without robots.txt will create errors when bots come. Way 2: "to keep robots.txt with nothing in" so you can avoid errors when bots come. Way 3: no robots.txt but create a customized error page Big Tips: read robots line by line each time


Robots.txt 是一個標準檔案,其功能是不讓
Googlebot 從您的 Web 伺服器下載資料。
同時也適用在其他一些主要的搜索引擎漫遊器。
Example Code:

http://---- .com / robots.txt ----------- disallow all (*) after (http://----.com/) User-Agent:* Disallow: / ----------- disallow all (*) after (http://----.com/folder/) User-Agent:* Disallow: /folder/ ------------------------------------- http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1 example: Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html. ------------------------------------------ User-agent: * Disallow: /cgi-bin/ Disallow: /privatedir/ Disallow: /hotnews/not4u.htm ------------------------ Allowing Googlebot block all bots but allow Googlebot User-agent: * Disallow: / User-agent: Googlebot Disallow: ------------------------ Block Googlebot entirely but allow ( Googlebot-Mobile ): crawls pages PS: ( Allow: ) this syntax may only work for Google bot only User-agent: Googlebot Disallow: / User-agent: Googlebot-Mobile Allow: or User-agent: Googlebot-Mobile Disallow: ------------------------ User-Agent: Googlebot Disallow: /folder1/ Allow: /folder1/myfile.html ------------------------ User-agent: Googlebot Disallow: /folder ------------------------ User-agent: Googlebot Disallow: /*.gif$ ------------------------ User-agent: Googlebot Disallow: /*? ------ more how to block googlebot ------ http://www.google.com/support/webmasters/bin/ answer.py?answer=40364&topic=8846 ------ more control msnbot ------ http://search.msn.com/docs/siteowner.aspx? t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm ------ more yahoo bot info ----- http://help.yahoo.com/help/us/ysearch/slurp/ ---------- some standard ------- http://www.robotstxt.org/wc/exclusion-admin.html What to put into the robots.txt file The "/robots.txt" file usually contains a record looking like this: User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~joe/ In this example, three directories are excluded. Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/". Also, you may not have blank lines in a record, as they are used to delimit multiple records. Note also that regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "Disallow: /tmp/*" or "Disallow: *.gif". What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples: To exclude all robots from the entire server User-agent: * Disallow: / To allow all robots complete access User-agent: * Disallow: Or create an empty "/robots.txt" file. To exclude all robots from part of the server User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /private/ To exclude a single robot User-agent: BadBot Disallow: / To allow a single robot User-agent: WebCrawler Disallow: User-agent: * Disallow: / To exclude all files except one This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory: User-agent: * Disallow: /~joe/docs/ Alternatively you can explicitly disallow all disallowed pages: User-agent: * Disallow: /~joe/private.html Disallow: /~joe/foo.html Disallow: /~joe/bar.html


阻止搜索引擎漫遊器讀取某個檔案夾
(某個網址下的所有網頁)
--------嚴重警告--------
所有的搜索引擎漫遊器並非都是一樣的
乖乖牌,依尋相同的規則
Example Result:

User-Agent:* Disallow: /


if you allow all to visit your all sites better not to keep the robots.txt like this User-Agent:* Disallow: sometimes it will confused with User-Agent:* Disallow: / sometomes may cause strange results for http:://----.com -vs- http:://----.com/


..
...
...

[ 7/29/2010 ]

www learning school add more scripts and tips memo
You are at >>3WT.EZER.COM >> 3WT.EZER.COM/ROBOTS/ROBOTS.TXT/DISALLOW.ASP>>ROBOTS
Helpful link:: SEO web tools :: Live PR | SERP checker Google SERP pageranking checker
back to top Ezer code adding :: Questions ;email