Documentation

Data Uploads

The Data Uploads page is where you add content to your AI chatbot's knowledge base. Access it from Admin > Uploads in your admin panel.

Overview

Four methods are available for adding content:

MethodBest For
Website CrawlerAutomatically importing your website content
DocumentsPDFs, Word docs, and text files
ProductsProduct catalogs via CSV
API SyncFetching products from external APIs

API Sync Available only upon request.

Website Crawler

The crawler automatically visits your website and imports content into the knowledge base.

How It Works

The crawler runs a 3-phase pipeline:

PhaseDescriptionProgress
1. CrawlVisits pages and saves HTML0-50%
2. AnalyzeDetects repeating content (headers, footers, menus)50-60%
3. EmbedGenerates searchable AI embeddings60-100%

Crawler Settings

SettingDescription
Home Page URLStarting URL for the crawl
Max PagesMaximum pages to crawl (default: 100)
Max DepthHow many links deep to follow (default: 3)
Fresh StartDelete all existing content before crawling
Use BrowserUse headless browser for JavaScript-rendered sites
Custom CookiesAdd cookies for authentication or session management

Use Browser Option

Enable Use Browser when crawling:

  • Single-page applications (SPAs) built with React, Vue, Angular
  • Sites that load content dynamically with JavaScript
  • Pages requiring client-side rendering

When enabled, the crawler uses Puppeteer (headless Chrome) instead of simple HTTP requests. This is slower but handles JavaScript-rendered content.

Custom Cookies

Add custom cookies when crawling:

  • Sites requiring authentication
  • Pages behind login walls
  • Sessions with specific preferences

To add cookies:

  1. Enter the Cookie name (e.g., session_id, auth_token)
  2. Enter the Cookie value
  3. Click the + button to add
  4. Repeat for additional cookies

Cookies persist across all requests during the crawl, including redirects.

Running a Crawl

  1. Enter or verify your Home Page URL
  2. Adjust Max Pages and Max Depth as needed
  3. Enable Fresh Start only if replacing all content
  4. Enable Use Browser for JavaScript-heavy sites
  5. Add Custom Cookies if authentication is required
  6. Click Start Crawl
  7. Monitor progress through all 3 phases
  8. Review results when complete

The crawl runs in the background - you can close the page and return later to check progress.

Failed URLs

Some URLs may fail during crawling:

  • Pages blocked by robots.txt
  • Login-required pages
  • Dead links (404 errors)

Failed URLs are displayed during and after the crawl for review.

Document Upload

Upload individual documents to expand your knowledge base.

Supported Formats

  • PDF - Portable Document Format
  • DOC/DOCX - Microsoft Word documents
  • HTML/HTM - Web pages
  • TXT - Plain text files
  • MD - Markdown files

Maximum file size: 10MB per file

Upload Options

OptionDescription
Hide URLDon't show file URL in chat citations
Chunk SizeHow to split the document for search

Chunk Size Options

PresetBest For
StandardBalanced retrieval (default)
LargeLong-form content, manuals
SmallPrecise Q&A, FAQs

Smaller chunks provide more precise search results. Larger chunks provide better context.

Optional Metadata

You can add metadata to improve searchability:

  • Title - Custom display title
  • Description - Document summary
  • Keywords - Search terms

Duplicate Handling

When uploading a file with the same name as an existing document:

  • Replace Existing - Delete old document, upload new
  • Cancel - Keep existing document

Product Upload

Upload your product catalog via CSV or Excel file using the 3-step wizard.

How It Works

The product upload wizard guides you through three steps:

StepDescription
1. SelectChoose a CSV or Excel file, validates required columns
2. MapConfigure media URL and numeric field mappings
3. UploadProducts are imported and indexed for search

Step 1: File Selection

  1. Drag and drop or browse for a CSV/Excel file
  2. The wizard validates that required columns exist
  3. Preview shows first 3 rows of data
  4. Optionally enable Fresh Start to delete existing products first
  5. Click Next: Configure Mapping

If required columns are missing, an error is displayed listing them.

Step 2: Field Mapping

Configure how optional fields map to your data:

FieldDescription
Media URLSelect column containing image/video/audio URLs
Numeric Field 1-3Select columns with numeric values for range queries
Also label asAbbreviations and synonyms for the column name

The dropdown only shows columns containing numeric values. If your CSV headers use the pattern column_name (numeric1), mappings are auto-detected.

What is "Also label as"?

When you map a column to a numeric field, the column name becomes the primary label automatically. The system also auto-generates common abbreviations for well-known terms like "square feet", "bedrooms", "mileage", etc.

The Also label as field lets you add additional abbreviations that users might type, which get merged with the auto-generated ones:

Column MappingAuto-GeneratedAlso Label AsAll Queries Supported
square_feet → Numeric 1sqft, sq ftft2"over 1000 square feet", "over 1000 sqft", "over 1000 ft2"
bedrooms → Numeric 2bedroom, bed, brbdrm"3 bedrooms", "3 bed", "3 bdrm"
mileage → Numeric 1miles, miodometer"under 50000 mileage", "under 50000 miles", "under 50000 odometer"

How it works:

  1. Column name (e.g., square_feet) → primary label (square feet)
  2. System auto-generates common abbreviations (e.g., sqft, sq ft)
  3. Your "Also label as" entries are merged in (e.g., ft2)
  4. All patterns are deduplicated and saved

You only need to add abbreviations that aren't auto-generated. Leave "Also label as" empty if the column name and its common abbreviations are sufficient.

Step 3: Upload Results

After clicking Upload Products, the wizard shows:

  • Total rows processed
  • Valid records inserted
  • Any validation errors encountered
  • Batch name for the upload

Required Columns

ColumnDescription
titleProduct name
urlProduct page URL
priceNumeric price value
categoryProduct category

Optional Columns

ColumnDescription
media_urlImage, video, or audio URL
Any other columnsAutomatically become searchable keywords

Numeric Fields with Auto-Detection

Pre-configure numeric field mappings directly in your CSV headers:

column_name (numericN)

Examples:

  • square_feet (numeric1) - Auto-maps to Numeric Field 1 with label "square_feet"
  • bedrooms (numeric2) - Auto-maps to Numeric Field 2 with label "bedrooms"
  • bathrooms (numeric3) - Auto-maps to Numeric Field 3 with label "bathrooms"

When detected, the wizard pre-selects the mapping and fills in the "Also label as" field automatically.

CSV Templates

Download example CSV templates from the dropdown:

TemplateNumeric Fields
Housing Rentalsquare_feet, bedrooms, bathrooms
Car Dealermileage, year, horsepower
Restaurant Menucalories, prep_time, spice_level
Jewelry Storecarat, price_per_carat

Select a template and click Download to get started.

Fresh Start Option

Enable Fresh Start to delete all existing uploaded products before uploading new ones. This:

  • Removes all products from previous uploads
  • Preserves crawled website content
  • Useful for complete catalog replacements

Batch Management

Products are grouped by batch name (derived from filename):

  • Re-uploading with the same filename replaces existing products
  • Use different filenames to maintain separate product sets

API Sync

Fetch product data directly from external APIs.

Configuration Fields

FieldDescription
API URLEndpoint to fetch from
HTTP MethodGET or POST
HeadersJSON headers (e.g., authentication)
Request BodyJSON body for POST requests
AI Extraction HintGuide what data to extract
Base URLFor resolving relative URLs
Max ProductsLimit products to extract
Batch NameName for this sync

AI Extraction

The AI analyzes the API response and extracts:

  • Product titles and URLs
  • Prices and categories
  • Keywords and attributes

Use the AI Extraction Hint to guide extraction:

  • "Focus on available products only"
  • "Extract size and color options"
  • "Ignore discontinued items"

Re-syncing

Using the same batch name replaces products from the previous sync, allowing scheduled updates without duplicates.

Document List

At the top of the page, the Document List shows all uploaded documents:

  • File name and title
  • Upload date
  • Processing status
  • Delete action

Click to expand and manage existing documents.

Best Practices

  1. Start with the Crawler - Crawl your website first for broad content coverage

  2. Add key documents - Upload important PDFs and documents that aren't on your website

  3. Import products - Add your product catalog for e-commerce capabilities

  4. Use Fresh Start sparingly - Only when completely replacing content

  5. Name batches meaningfully - Makes managing and re-syncing easier

  6. Test after uploads - Use Data Management to verify content appears correctly

  7. Monitor failed URLs - Review and fix issues with pages that couldn't be crawled

  8. Use labeled CSV headers - Include (numericN) in column names (e.g., square_feet (numeric1)) for auto-detection in the mapping step

  9. Download templates first - Use the example CSV templates to see the expected format before creating your own

Related Pages