Skip to main content

Search Information for Webmasters

Google CSE

Web Communications has created a Google Custom Search Engine account for use across the university. Any content from Northwestern subdomains that can be crawled by Google will be automatically included. Content can be added and removed using Google's Search Console. Google also respects robots.txt and meta-tagging. For more information about this option, please email Webcomm Support.

SearchBlox

SearchBlox replaced a Google Search Appliance, which was retired in July 2018.

For specific or technical questions not covered in this document, please refer to the official SearchBlox documentation.

Crawl and Index

Crawling is the process by which new URLs are discovered by following links in pages. The SearchBlox crawler begins at the Root URLs specified in collections maintained by Global Marketing and Communications and partnering departments and schools. For some subdomains, only the site homepage is indexed. To request a full index of your site, please contact Web Communications.

Indexing is the process by which content is processed for search. The indexing algorithm uses various page components to determine the relevance of a page for keyword searches.

View FAQ on crawling and indexing to find:

Content Inclusion

To add your subdomain to SearchBlox, please contact Web Communications. Learn more about how you can include your content.

Content Exclusion

You can exclude either entire web sites or specific pages from the search engine through a simple robots.txt exclusion. Learn more about this technique and other ways to exclude content such as: