Starts an asynchronous website crawl operation to find all the sub URLs of the provided root URL.
You can check the status of created crawl operation using the Get Crawl Status endpoint or stop it using the Stop Crawl endpoint.
Crawling does not mean that the discovered URLs are indexed immediately. You need to manually add the discovered URLs as a data source by passing them to Create Data Source endpoint.
Crawls are rate limited to 1 concurrent operation per Guru type. Subsequent requests will fail if a crawl is already running.
Skipped Paths: The crawler automatically skips non-content paths like /feed/, /rss/, /static/, /assets/, /media/, /wp-admin/, /wp-json/, /_static/, /_sources/, and common file extensions (images, PDFs, CSS, JS, etc.).
Path Parameters
The slug of the Guru type to associate the crawled content with
Body Parameters
The root URL to start crawling from. Must include http:// or https:// protocol.Important: The crawler only discovers URLs that start with this path. For example:
https://example.com/ → crawls entire site
https://example.com/docs/ → only crawls URLs under /docs/
When true, query parameters are stripped from discovered URLs (e.g., ?utm_source=..., ?id=123).
Set to false if you need to crawl paginated content like ?page=1, ?page=2.
Response
Unique identifier for the crawl operation
Current status of the crawl operation
The Guru type that the crawl was initiated for
List of URLs discovered during crawling
Timestamp when crawl started (ISO 8601 format)
Timestamp when crawl ended (ISO 8601 format)
Whether query parameters are being stripped from discovered URLs
{
"id": 211,
"url": "https://getanteon.com/",
"status": "RUNNING",
"guru_type": "anteon",
"discovered_urls": [],
"start_time": "2025-02-21T10:25:22.710211Z",
"end_time": null,
"link_limit": 1500,
"ignore_query_params": true
}