At a Glance
Website
Crawl or sitemap import
Documents & manuals
Excel
Spreadsheets & data
GitHub
Repositories & code
YouTube
Video transcripts
Text
Custom snippets
Zendesk
Tickets & articles
Confluence
Wiki & docs
Jira
Issues & projects
Slack
Conversations
Google Drive
Docs & files
Salesforce
Knowledge base
File-Based Sources
PDF Documents
Upload PDF files to index text and images. Perfect for documentation, research papers, and manuals.| Feature | Description |
|---|---|
| Text extraction | Full text content from all pages |
| Image indexing | Images within PDFs are also indexed |
| Batch upload | Upload multiple PDFs at once |
Excel Files
Upload spreadsheets (.xls, .xlsx) to index tabular data with header relationships.| Feature | Description |
|---|---|
| Cell extraction | Text content from all cells |
| Structure preservation | Maintains table relationships |
| Batch upload | Upload multiple files at once |
Excel Extraction Best Practices
Learn how to prepare Excel files for optimal extraction
Custom Text
Add custom text directly for FAQs, instructions, or any content that doesn’t fit other categories.| Option | Description |
|---|---|
| Subtype | Choose “Text” for plain content or “Mermaid” for flowcharts/diagrams |
| Labels | Add labels to organize and filter text sources |
Zendesk Playbooks: To create visual playbooks for the Zendesk app, select “Mermaid” as the subtype and add the “playbooks” label. These will appear in the Playbooks tab with interactive flowcharts.
Web Sources
Website Indexing
Index entire websites or specific pages. Gurubase extracts text, headings, and structured content.| Method | Description |
|---|---|
| Sitemap Import | Import all URLs from a sitemap |
| Crawl Website | Automatically discover and crawl all internal pages |
| Manual URLs | Add specific page URLs |
Crawl Options
| Option | Description |
|---|---|
| URL Scope | The crawler only discovers URLs that start with the provided path. Use https://example.com/ for the entire site, or https://example.com/docs/ to crawl only the /docs/ section. |
| Skip Query Params | Enabled by default. Strips query parameters from URLs (e.g., ?utm_source=...). Disable for paginated content like ?page=1, ?page=2. |
| Sort URLs | Click the sort button (A-Z icon) to alphabetically sort discovered URLs. |
Skipped Paths: The crawler automatically skips non-content paths like
/feed/, /rss/, /static/, /assets/, /media/, /wp-admin/, /wp-json/, /_static/, /_sources/, and common file extensions (images, PDFs, CSS, JS, etc.).YouTube Videos
Import video transcripts and metadata from YouTube channels, playlists, or individual videos.| Method | Description |
|---|---|
| Channel Import | All videos from a YouTube channel |
| Playlist Import | All videos from a specific playlist |
| Manual URLs | Specific video URLs |
Code Repositories
GitHub
Index repositories to make code, documentation, and README files searchable. Supports public and private repositories.| Feature | Description |
|---|---|
| Glob patterns | Control which files are indexed |
| Private repos | Use GitHub tokens for access |
| Code + docs | Index code files and documentation |
| Token Type | Permissions Required |
|---|---|
| Classic | repo scope |
| Fine-grained | ”Contents” (read) + “Metadata” (read) |
| Public repos | No token needed |
Classic Token Setup
Classic Token Setup
- Go to GitHub Tokens (classic) (GitHub Settings → Developer settings → Personal access tokens → Tokens (classic))
- Click Generate new token (classic)
- Under Select scopes, check the repo checkbox (this grants full control of private repositories)

Fine-grained Token Setup
Fine-grained Token Setup
- Go to GitHub Fine-grained tokens (GitHub Settings → Developer settings → Personal access tokens → Fine-grained tokens)
- Click Generate new token
- Enter a Token name (e.g., “Gurubase Token”)
- Set Expiration as needed
- Under Repository access, select one of:
- All repositories - Access all current and future repositories
- Only select repositories - Choose specific repositories (max 50)
- Under Permissions → Repository permissions, add:
- Contents: Read-only
- Metadata: Read-only (required)

| Pattern | Matches |
|---|---|
**/*.py | All Python files (recursive) |
**/*.{js,ts} | All JavaScript and TypeScript files |
packages/** | Everything inside packages folder |
!test_*.py | Exclude test files |
Test your patterns with globster.xyz before adding.
Platform Integrations
Platform integrations provide automated syncing and deeper integration with enterprise tools.Zendesk
Index support tickets and help center articles. Set up backfill jobs for automated syncing.Zendesk Integration
Import tickets, articles, comments, and attachments
Confluence
Index Atlassian Confluence spaces and pages. Supports CQL queries for advanced filtering.Confluence Integration
Sync team documentation and wiki pages
Jira
Index project issues, tickets, and related documentation.Jira Integration
Import issues and project data
Slack
Index Slack conversations and threads. Configure trusted users for targeted content.Slack Integration
Capture team knowledge from conversations
Google Drive
Connect Google Drive to index documents, spreadsheets, and files.Google Drive Integration
Sync Google Docs, Sheets, and files
Salesforce
Index Salesforce Knowledge Base articles. Supports SOQL queries for filtering.Salesforce Integration
Import knowledge base articles
Best Practices
Content Organization
Content Organization
- Mix source types - Combine documents, websites, and integrations for comprehensive coverage
- Use labels - Organize text sources with labels (e.g.,
playbooks,faq) - Platform integrations - Use for content that changes frequently
Data Quality
Data Quality
- Review indexed content - Check what’s actually indexed in your Guru’s sources
- Use filtering - Glob patterns for GitHub, CQL for Confluence, SOQL for Salesforce
- Test your Guru - Ask sample questions to verify content quality
Keeping Content Fresh
Keeping Content Fresh
- Backfill jobs - Set up automated syncing for Zendesk, Confluence, Jira, Slack
- Reindex regularly - Use the reindex option for websites and documents
- Monitor status - Check integration status in your Guru’s settings