Indexing,Data Source

Web Crawler

Crawl the pages you need, harvest the data you want. And nothing else!

Forward Search includes as part of the core system, a powerful crawler. Both the file-system crawler and the web-page crawler provides a large set of configurable options to customize the crawling behavior to specific needs. This includes filtering by exclude patterns and Include patterns, appropriate handling of anchor tags and excessive url-slashes, timeout-settings and more. 

The Forward Search Web Crawler respects the common crawler-exclusion standards, both the 'robots.txt' file and the 'NoFollow' and 'NoIndex' html tags, to control precisely what is indexed and what is not. Forward Search allows uninterrupted crawling of protected pages using special login procedures, overcoming ssl certificate promps, and it can crawl through a proxy server.

Forward Search Crawler logs the crawling process, and thereby providing statistical information for monitoring website performance, including broken links and slowly responding pages (Black Sheeps) - both easily accessed from the Forward Search Administration client.