Data Uploads
The Data Uploads page is where you add content to your AI chatbot's knowledge base. Access it from Admin > Uploads in your admin panel.
Overview
Four methods are available for adding content:
| Method | Best For |
|---|---|
| Website Crawler | Automatically importing your website content |
| Documents | PDFs, Word docs, and text files |
| Products | Product catalogs via CSV |
| API Sync | Fetching products from external APIs |
API Sync Available only upon request.
Website Crawler
The crawler automatically visits your website and imports content into the knowledge base.
How It Works
The crawler runs a 3-phase pipeline:
| Phase | Description | Progress |
|---|---|---|
| 1. Crawl | Visits pages and saves HTML | 0-50% |
| 2. Analyze | Detects repeating content (headers, footers, menus) | 50-60% |
| 3. Embed | Generates searchable AI embeddings | 60-100% |
Crawler Settings
| Setting | Description |
|---|---|
| Home Page URL | Starting URL for the crawl |
| Max Pages | Maximum pages to crawl (default: 100) |
| Max Depth | How many links deep to follow (default: 3) |
| Fresh Start | Delete all existing content before crawling |
| Use Browser | Use headless browser for JavaScript-rendered sites |
| Custom Cookies | Add cookies for authentication or session management |
Use Browser Option
Enable Use Browser when crawling:
- Single-page applications (SPAs) built with React, Vue, Angular
- Sites that load content dynamically with JavaScript
- Pages requiring client-side rendering
When enabled, the crawler uses Puppeteer (headless Chrome) instead of simple HTTP requests. This is slower but handles JavaScript-rendered content.
Custom Cookies
Add custom cookies when crawling:
- Sites requiring authentication
- Pages behind login walls
- Sessions with specific preferences
To add cookies:
- Enter the Cookie name (e.g.,
session_id,auth_token) - Enter the Cookie value
- Click the + button to add
- Repeat for additional cookies
Cookies persist across all requests during the crawl, including redirects.
Running a Crawl
- Enter or verify your Home Page URL
- Adjust Max Pages and Max Depth as needed
- Enable Fresh Start only if replacing all content
- Enable Use Browser for JavaScript-heavy sites
- Add Custom Cookies if authentication is required
- Click Start Crawl
- Monitor progress through all 3 phases
- Review results when complete
The crawl runs in the background - you can close the page and return later to check progress.
Failed URLs
Some URLs may fail during crawling:
- Pages blocked by robots.txt
- Login-required pages
- Dead links (404 errors)
Failed URLs are displayed during and after the crawl for review.
Document Upload
Upload individual documents to expand your knowledge base.
Supported Formats
- PDF - Portable Document Format
- DOC/DOCX - Microsoft Word documents
- HTML/HTM - Web pages
- TXT - Plain text files
- MD - Markdown files
Maximum file size: 10MB per file
Upload Options
| Option | Description |
|---|---|
| Hide URL | Don't show file URL in chat citations |
| Chunk Size | How to split the document for search |
Chunk Size Options
| Preset | Best For |
|---|---|
| Standard | Balanced retrieval (default) |
| Large | Long-form content, manuals |
| Small | Precise Q&A, FAQs |
Smaller chunks provide more precise search results. Larger chunks provide better context.
Optional Metadata
You can add metadata to improve searchability:
- Title - Custom display title
- Description - Document summary
- Keywords - Search terms
Duplicate Handling
When uploading a file with the same name as an existing document:
- Replace Existing - Delete old document, upload new
- Cancel - Keep existing document
Product Upload
Upload your product catalog via CSV or Excel file using the 3-step wizard.
How It Works
The product upload wizard guides you through three steps:
| Step | Description |
|---|---|
| 1. Select | Choose a CSV or Excel file, validates required columns |
| 2. Map | Configure media URL and numeric field mappings |
| 3. Upload | Products are imported and indexed for search |
Step 1: File Selection
- Drag and drop or browse for a CSV/Excel file
- The wizard validates that required columns exist
- Preview shows first 3 rows of data
- Optionally enable Fresh Start to delete existing products first
- Click Next: Configure Mapping
If required columns are missing, an error is displayed listing them.
Step 2: Field Mapping
Configure how optional fields map to your data:
| Field | Description |
|---|---|
| Media URL | Select column containing image/video/audio URLs |
| Numeric Field 1-3 | Select columns with numeric values for range queries |
| Also label as | Abbreviations and synonyms for the column name |
The dropdown only shows columns containing numeric values. If your CSV headers use the pattern column_name (numeric1), mappings are auto-detected.
What is "Also label as"?
When you map a column to a numeric field, the column name becomes the primary label automatically. The system also auto-generates common abbreviations for well-known terms like "square feet", "bedrooms", "mileage", etc.
The Also label as field lets you add additional abbreviations that users might type, which get merged with the auto-generated ones:
| Column Mapping | Auto-Generated | Also Label As | All Queries Supported |
|---|---|---|---|
| square_feet → Numeric 1 | sqft, sq ft | ft2 | "over 1000 square feet", "over 1000 sqft", "over 1000 ft2" |
| bedrooms → Numeric 2 | bedroom, bed, br | bdrm | "3 bedrooms", "3 bed", "3 bdrm" |
| mileage → Numeric 1 | miles, mi | odometer | "under 50000 mileage", "under 50000 miles", "under 50000 odometer" |
How it works:
- Column name (e.g.,
square_feet) → primary label (square feet) - System auto-generates common abbreviations (e.g.,
sqft,sq ft) - Your "Also label as" entries are merged in (e.g.,
ft2) - All patterns are deduplicated and saved
You only need to add abbreviations that aren't auto-generated. Leave "Also label as" empty if the column name and its common abbreviations are sufficient.
Step 3: Upload Results
After clicking Upload Products, the wizard shows:
- Total rows processed
- Valid records inserted
- Any validation errors encountered
- Batch name for the upload
Required Columns
| Column | Description |
|---|---|
| title | Product name |
| url | Product page URL |
| price | Numeric price value |
| category | Product category |
Optional Columns
| Column | Description |
|---|---|
| media_url | Image, video, or audio URL |
| Any other columns | Automatically become searchable keywords |
Numeric Fields with Auto-Detection
Pre-configure numeric field mappings directly in your CSV headers:
column_name (numericN)
Examples:
square_feet (numeric1)- Auto-maps to Numeric Field 1 with label "square_feet"bedrooms (numeric2)- Auto-maps to Numeric Field 2 with label "bedrooms"bathrooms (numeric3)- Auto-maps to Numeric Field 3 with label "bathrooms"
When detected, the wizard pre-selects the mapping and fills in the "Also label as" field automatically.
CSV Templates
Download example CSV templates from the dropdown:
| Template | Numeric Fields |
|---|---|
| Housing Rental | square_feet, bedrooms, bathrooms |
| Car Dealer | mileage, year, horsepower |
| Restaurant Menu | calories, prep_time, spice_level |
| Jewelry Store | carat, price_per_carat |
Select a template and click Download to get started.
Fresh Start Option
Enable Fresh Start to delete all existing uploaded products before uploading new ones. This:
- Removes all products from previous uploads
- Preserves crawled website content
- Useful for complete catalog replacements
Batch Management
Products are grouped by batch name (derived from filename):
- Re-uploading with the same filename replaces existing products
- Use different filenames to maintain separate product sets
API Sync
Fetch product data directly from external APIs.
Configuration Fields
| Field | Description |
|---|---|
| API URL | Endpoint to fetch from |
| HTTP Method | GET or POST |
| Headers | JSON headers (e.g., authentication) |
| Request Body | JSON body for POST requests |
| AI Extraction Hint | Guide what data to extract |
| Base URL | For resolving relative URLs |
| Max Products | Limit products to extract |
| Batch Name | Name for this sync |
AI Extraction
The AI analyzes the API response and extracts:
- Product titles and URLs
- Prices and categories
- Keywords and attributes
Use the AI Extraction Hint to guide extraction:
- "Focus on available products only"
- "Extract size and color options"
- "Ignore discontinued items"
Re-syncing
Using the same batch name replaces products from the previous sync, allowing scheduled updates without duplicates.
Document List
At the top of the page, the Document List shows all uploaded documents:
- File name and title
- Upload date
- Processing status
- Delete action
Click to expand and manage existing documents.
Best Practices
-
Start with the Crawler - Crawl your website first for broad content coverage
-
Add key documents - Upload important PDFs and documents that aren't on your website
-
Import products - Add your product catalog for e-commerce capabilities
-
Use Fresh Start sparingly - Only when completely replacing content
-
Name batches meaningfully - Makes managing and re-syncing easier
-
Test after uploads - Use Data Management to verify content appears correctly
-
Monitor failed URLs - Review and fix issues with pages that couldn't be crawled
-
Use labeled CSV headers - Include
(numericN)in column names (e.g.,square_feet (numeric1)) for auto-detection in the mapping step -
Download templates first - Use the example CSV templates to see the expected format before creating your own
Related Pages
- Data Management - Browse and test your knowledge base
- Media Library - Manage images and media files
- App Config - Configure crawler and RAG settings