Overview
Gurubase supports a wide variety of data sources to help you build comprehensive knowledge bases. You can index content from websites, documents, platforms, and even custom text to create powerful AI assistants that understand your specific domain.Supported Data Source Types
PDF Documents
Upload PDF files directly to Gurubase to index their content. This is perfect for documentation, research papers, manuals, and any text-based PDF content. Features:- Text extraction from PDF content
- Image indexing - Gurubase can also index images within PDF files
- Multiple file support - Upload multiple PDFs at once
- Go to your Guru’s settings
- Click “Add” under Sources
- Select “PDF” and upload your files
- Gurubase will automatically extract and index the text content and images
Website Indexing
Index entire websites or specific web pages by providing URLs. Gurubase will crawl and extract content from web pages, including text, headings, and structured content. Features:- Sitemap import - Import URLs from a website’s sitemap using the “Import from Sitemap” button
- Website crawling - Automatically discover and crawl all internal URLs belonging to a given domain using the “Crawl Website” button
- Image indexing - Index images from websites (requires plan feature)
- Content extraction - Extract text, headings, and structured content from web pages
- Use the website URL option in your Guru’s sources
- Option 1: Import from sitemap - provide the sitemap URL
- Option 2: Crawl website - provide the base URL to discover all internal pages
- Option 3: Manual URLs - provide specific page URLs
- Gurubase will crawl and index the content automatically
YouTube Content
Import YouTube channels or playlists to index video transcripts and metadata. This is ideal for educational content, tutorials, and video-based knowledge. Features:- Channel import - Import all videos from a YouTube channel using the “Import Channel” button
- Playlist import - Import all videos from a specific playlist using the “Import Playlist” button
- Transcript extraction - Automatically extract video transcripts and metadata
- Video metadata - Index titles, descriptions, and other video information
- Use the YouTube integration in your Guru’s sources
- Option 1: Import channel - provide YouTube channel URL
- Option 2: Import playlist - provide YouTube playlist URL
- Option 3: Manual URLs - provide specific video URLs
- Gurubase will extract transcripts and video information automatically
GitHub Repositories
Index GitHub repositories to make your code documentation, README files, and project information searchable. Perfect for technical documentation and developer resources. Features:- Flexible file filtering - Use glob patterns to control which files are indexed
- Include/exclude patterns - Specify exactly which files to include or exclude
- Code documentation - Index README files, documentation, and other text-based content
- Provide GitHub repository URL
- Configure glob patterns to control file indexing:
- Include pattern:
**/*.py
(index all Python files) - Exclude pattern:
test_*.py
(skip test files)
- Include pattern:
- Gurubase will index files matching your specified pattern
**/*.py
- All Python files (recursive)*.py
- Only top-level Python files**/*.{js,ts}
- All JavaScript and TypeScript filestest_*.py
- Exclude test filespackages/**
- Everything inside the packages folder
For detailed glob pattern syntax and examples, test your patterns with tools like globster.xyz or Python’s
glob.glob()
locally.Excel Files
Upload Excel files (.xls, .xlsx) to index spreadsheet data. Gurubase can extract text content alongside header relationships from cells, making your data searchable and queryable. Features:- Text extraction from spreadsheet cells
- Multiple file support - Upload multiple Excel files at once
- Structured data - Maintains table structure and relationships
- Upload Excel files directly through the file upload option
- Gurubase will extract and index the text content from your spreadsheets
Excel Extraction Best Practices
Learn how to prepare Excel files for optimal extraction and analysis
Custom Text
Add custom text content directly to your Guru. This is useful for specific instructions, context, or any text-based information that doesn’t fit other categories. How to add:- Use the text input option in your Guru’s sources
- Paste or type your custom content
- This content will be indexed alongside your other sources
Platform Integrations
For more advanced integrations with specific platforms, Gurubase offers dedicated connectors that provide deeper integration and automated syncing capabilities.Confluence
Connect with Atlassian Confluence to index your team’s documentation and knowledge base articles. Supports both individual page imports and automated backfill jobs using CQL queries.Learn More
Set up Confluence integration with automated syncing
Jira
Integrate with Jira to index your project issues, tickets, and related documentation. Perfect for technical support and project management knowledge.Learn More
Connect Jira and index your project issues
Salesforce Knowledge Base
Connect with Salesforce to index your Knowledge Base articles and customer support documentation. Supports SOQL queries for advanced filtering.Learn More
Set up Salesforce Knowledge Base integration
Slack Threads
Index your Slack conversations and threads to capture team knowledge and discussions. Configure trusted users and filtering options for targeted content.Learn More
Index Slack conversations and team knowledge
Zendesk
Connect with Zendesk to index your support tickets and help center articles. Supports both individual imports and automated backfill jobs.Learn More
Integrate Zendesk tickets and articles
Best Practices
Content Organization
- Mix different source types to create comprehensive knowledge bases
- Use platform integrations for ongoing content that changes frequently
- Upload static documents for reference materials and documentation
Data Quality
- Review indexed content regularly to ensure accuracy
- Use filtering options in platform integrations to focus on relevant content
- Test your Guru with sample questions to verify content quality
Maintenance
- Set up automated syncing for dynamic content sources
- Monitor integration status to ensure continuous data flow
- Update sources as your knowledge base evolves