ImprovementJune 18, 20262 min read

More reliable website import

We rebuilt the website crawler with automatic failover and smarter retries, so importing a site recovers from temporary hiccups and captures more pages.

Importing a website into a knowledge base sounds simple, but the open web is messy: pages time out, servers rate-limit, and a single transient error used to be enough to drop a page — or stall an import partway through. We've rebuilt how website import works so those moments recover on their own.

Built to recover

The crawler now runs with automatic failover. If the primary path to fetch a page has trouble, the work shifts to a backup path instead of failing, and transient errors are retried with sensible backoff rather than abandoned. The practical effect is fewer half-finished imports and more of a site's pages actually making it into your knowledge base.

Improvement

More reliable website import

Knowledge base list with each base marked Ready and its crawl, parse, store and enrich steps completed end to end

What changed under the hood

Automatic failover — if one route to crawl a page degrades, requests reroute to a healthy one, so a temporary outage on a single path no longer takes imports down with it.
Smarter retries — pages that hit a momentary error (a timeout, a brief rate-limit) are retried instead of dropped, so transient blips don't quietly shrink your knowledge base.
Steadier indexing — the step that stores and indexes crawled content tolerates brief interruptions, so pages aren't lost between being fetched and being saved.
Honest status — an import only reports success when content was genuinely captured, so a "ready" knowledge base really does have your pages in it.

What you'll notice

Most of this is invisible by design — the goal is that you simply don't think about crawl reliability anymore. When you import a site, more of its pages come through, large sites complete more often on the first try, and the occasional bad moment on the wider internet no longer means starting over. If a site genuinely can't be crawled, you still get a clear status rather than a silent partial result.

Why it matters

A knowledge base built from half a website teaches the AI half the story — and you'd never know which half was missing. Reliable import is the foundation everything else rests on: the more completely and dependably your sources come in, the more accurate and specific the outreach written from them. By making the crawler resilient to the normal turbulence of the web, we've made the knowledge your campaigns rely on more complete and more trustworthy, without asking anything extra of you.