Skip to main content
POST
https://api.gurubase.io/api/v1/
/
{guru_slug}
/
crawl
/
start
Start Website Crawl
curl --request POST \
  --url https://api.gurubase.io/api/v1/{guru_slug}/crawl/start/ \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "url": "<string>",
  "ignore_query_params": true
}
'
{
    "id": 211,
    "url": "https://getanteon.com/",
    "status": "RUNNING",
    "guru_type": "anteon",
    "discovered_urls": [],
    "start_time": "2025-02-21T10:25:22.710211Z",
    "end_time": null,
    "link_limit": 1500,
    "ignore_query_params": true
}
Starts an asynchronous website crawl operation to find all the sub URLs of the provided root URL. You can check the status of created crawl operation using the Get Crawl Status endpoint or stop it using the Stop Crawl endpoint. Crawling does not mean that the discovered URLs are indexed immediately. You need to manually add the discovered URLs as a data source by passing them to Create Data Source endpoint.
Crawls are rate limited to 1 concurrent operation per Guru type. Subsequent requests will fail if a crawl is already running.
Skipped Paths: The crawler automatically skips non-content paths like /feed/, /rss/, /static/, /assets/, /media/, /wp-admin/, /wp-json/, /_static/, /_sources/, and common file extensions (images, PDFs, CSS, JS, etc.).

Path Parameters

guru_slug
string
required
The slug of the Guru type to associate the crawled content with

Headers

x-api-key
string
required
Your API key for authentication. You can obtain your API key from the Gurubase dashboard.

Body Parameters

url
string
required
The root URL to start crawling from. Must include http:// or https:// protocol.Important: The crawler only discovers URLs that start with this path. For example:
  • https://example.com/ → crawls entire site
  • https://example.com/docs/ → only crawls URLs under /docs/
ignore_query_params
boolean
default:"true"
When true, query parameters are stripped from discovered URLs (e.g., ?utm_source=..., ?id=123). Set to false if you need to crawl paginated content like ?page=1, ?page=2.

Response

id
string
Unique identifier for the crawl operation
url
string
The root URL to be crawled. The crawler will start from this URL and follow all links that begin with it. For example, if the URL is https://example.com/a/b/c, the crawler will extract all links that start with https://example.com/a/b/c.
status
string
Current status of the crawl operation
guru_type
string
The Guru type that the crawl was initiated for
discovered_urls
list
List of URLs discovered during crawling
start_time
string
Timestamp when crawl started (ISO 8601 format)
end_time
string
Timestamp when crawl ended (ISO 8601 format)
ignore_query_params
boolean
Whether query parameters are being stripped from discovered URLs
{
    "id": 211,
    "url": "https://getanteon.com/",
    "status": "RUNNING",
    "guru_type": "anteon",
    "discovered_urls": [],
    "start_time": "2025-02-21T10:25:22.710211Z",
    "end_time": null,
    "link_limit": 1500,
    "ignore_query_params": true
}