Search Information for Webmasters
Web Communications has created a Google Custom Search Engine account for use across the university. Any content from Northwestern subdomains that can be crawled by Google will be automatically included. Content can be added and removed using Google's Search Console. Google also respects robots.txt and meta-tagging. For more information about this option, please email Webcomm Support.
SearchBlox replaced a Google Search Appliance, which was retired in July 2018.
For specific or technical questions not covered in this document, please refer to the official SearchBlox documentation.
Crawling is the process by which new URLs are discovered by following links in pages. The SearchBlox crawler begins at the Root URLs specified in collections maintained by Global Marketing and Communications and partnering departments and schools. For some subdomains, only the site homepage is indexed. To request a full index of your site, please contact Web Communications.
Indexing is the process by which content is processed for search. The indexing algorithm uses various page components to determine the relevance of a page for keyword searches.
View FAQ on crawling and indexing to find:
- How often does the crawler run?
- How can I submit my page for crawling?
- What content is indexed?
- How does meta-tagging affect indexing?
You can exclude either entire web sites or specific pages from the search engine through a simple robots.txt exclusion. Learn more about this technique and other ways to exclude content such as:
- Meta directives
- Web Server Headers
- Request to Web Communications