If you plan to use a Stanford Google Custom Search box on your Stanford web site, please subscribe to search-partners@lists.stanford.edu for notifications of service changes, updates, etc.
How to...
Put a search box on your site
You can add a search box to your web site to help visitors find content on your site. You can restrict the search to a specified directory, or search the entire Stanford web site. The search box will look and behave like this:
The visitor can enter a search term and click the Search button; the page will leave your site and display the search results in a formatted page.
-
Insert one of the following HTML into your web page where you want the search box to appear:
- Customize the following required parameters:
- name="q" size="31"
Sets the width (in number of characters) of the search box. You can change the size to suit your site's layout.
- name="q" size="31"
-
Customize the following optional parameters:
If you want to restrict your search feature to one specific directory (and its subdirectories), include the following two parameters (as_dt and as_sitesearch). Restriction to multiple site URLs is not supported.
If you want the search feature on your site to search the entire Stanford collection, remove these two parameters from your HTML.
-
name="as_dt" value="i"
This setting determines whether your search should include or exclude the directory specified in "as_sitesearch". Values can be:- "i" (include only results in the web directory specified by as_sitesearch)
- "e" (exclude all results in the web directory specified by as_sitesearch)
-
name="as_sitesearch" value="<yoururl>"
Pages in the specified directory will be included in or excluded from your search (according to the value of "as_dt").
e.g.: name="as_sitesearch" value="web.stanford.edu/dept/classics"- You must specify the name of the host server followed by the path of the directory.
e.g.:- web.stanford.edu/dept/classics not www.stanford.edu/dept/classics
- *.stanford.edu/dept/anthropology (for sites hosted on AFS that have been previously indexed at www.stanford.edu)
- If the ("/") character is at the end of the web directory path specified, then only files within that directory will be searched and files in sub-directories will not be considered.
e.g.:- web.stanford.edu/dept/classics to include sub-directories
- web.stanford.edu/dept/classics/ to exclude sub-directories
- as_sitesearch allows allows you to specify one directory (and all its sub-directories) as the domain to be searched—you cannot specify multiple disparate directories using this option.
- If you want the search feature on your site to search the entire Stanford web site, delete this parameter.
- You must specify the name of the host server followed by the path of the directory.
If you need to search more than one directory or Stanford subdomain, we recommend that you create your own Google Custom Search Engine. This is a free service. -
Get pages into the index
Google Custom Search uses the Google index. All you need to do to get your web pages into the Stanford/Google index is:
- put the pages up in a web space
- make sure your pages don't contain meta tags that prevent the robot from indexing your page
- submit your page for indexing by Google's crawler
- AND/OR have your page linked to by other pages in Google's index like the Stanford a-z index
The Google crawler will pick up changed, new, and removed pages automatically when it visits Stanford web sites. Content crawl frequency is dependent on how important Google's algorithm believes it to be. For instance, pages that Google believes to be important and quickly changing are crawled frequently, while others are crawled less frequently (up to two weeks before being revisited).
If a page is not in the index, perform a search for all pages that link to your page. The syntax for this search is: "link:yourdomain.com" For example, to see if pages link to your personal page at Stanford, you would enter "link:http://web.stanford.edu/~mypage" into the http://www.stanford.edu search box. The results will give you a list of all pages that link to your page.
If you would like your page(s) to be listed in the Stanford index, visit http://www.stanford.edu/atoz/ and click on the "suggestions" link.
Note that if your web pages do not have any external links from other pages in the Stanford search collection, they won't be picked up by the Google crawler.
Keep pages out of the index
If you don't want a page to be indexed, insert this <meta> tag within your page's <head> tag:
<head>
<meta content="noindex, nofollow">
</head>
This will prevent crawlers (robots) from indexing the page, and from following any links from the page. If the page has already been indexed, it will be removed from the index the next time Google crawls the page.
You can prevent the pages in a directory from being indexed by restricting access to the directory with WebAuth.
Stanford's configuration of Google custom search
Search domains
Stanford's search collection includes all the web pages in these domains:
- http://www.stanford.edu
- http://web.stanford.edu
- http://*.stanford.edu(including most virtual URLs such as medicine.stanford.edu)
- http://www.stanfordalumni.org
- http://www.stanfordmag.org
- http://gostanford.com
...that are not specifically excluded by:
- the search administrator
- a noindex <meta> tag in the page's HTML
- password (including webauth) protection
- restricted-access files and/or directories
Web pages excluded by the search administrator
Web pages in the following directories (and their subdirectories) are excluded from the Stanford search collection:
- URLs being phased out of use
e.g.: http://www-leland.stanford.edu - webauth-protected (or otherwise restricted-access) pages and directories
- specific pages kept out of the index at the request of their owners
These pages have been excluded for a variety of system performance, copyright, license, and University policy reasons.
Additional directories or pages not listed here may have been excluded by the search administrator. If you think your page may have been excluded and don't want it to be, submit a Help ticket..
Crawling schedule
Google crawls Stanford web sites at different paces depending on how its algorithm handles different factors like relevancy, quality, type, frequency of update, and what other pages link to the content. Please read the Web Search FAQ (linked from the right sidebar) for more information about getting your page or web site indexed.