Skip to main content

What Can You Upload to Train Your AI Assistant? (Complete File Guide)

Back to Blog
Guide

What Can You Upload to Train Your AI Assistant? (Complete File Guide)

A comprehensive guide to file formats, best practices, and optimization tips for training your AI assistant's knowledge base.

Assisters TeamJanuary 3, 202610 min read

What Can You Upload to Train Your AI Assistant?

Your AI assistant is only as good as its knowledge base. This guide covers everything you need to know about uploading content—file formats, best practices, and optimization tips.

Supported File Formats

Documents

**PDF Files (.pdf)**

  • Standard text PDFs: Fully supported
  • Scanned PDFs: Supported via OCR (optical character recognition)
  • Image-heavy PDFs: Text extracted, images processed for text
  • Max size: 10MB per file

**Word Documents (.doc, .docx)**

  • Full formatting preserved for context
  • Headers and structure maintained
  • Tables converted to readable format
  • Max size: 10MB per file

**Text Files (.txt)**

  • Plain text, UTF-8 encoding recommended
  • Great for FAQs and simple content
  • No formatting overhead
  • Max size: 10MB per file

**Markdown Files (.md)**

  • Structure preserved (headers, lists)
  • Code blocks included
  • Ideal for technical documentation
  • Max size: 10MB per file

Spreadsheets

**CSV Files (.csv)**

  • Rows converted to readable entries
  • Great for product catalogs, FAQs, data tables
  • First row treated as headers
  • Max size: 10MB per file

**Excel Files (.xlsx)**

  • First sheet processed by default
  • Tables and data extracted
  • Formulas converted to values
  • Max size: 10MB per file

Images

**Image Files (.png, .jpg, .jpeg)**

  • OCR extracts visible text
  • Great for scanned documents, screenshots, infographics
  • Handwritten text partially supported
  • Max size: 5MB per file

What Makes Good Training Content?

High-Quality Content Characteristics

**Specific and Detailed**

  • Bad: "Our product helps with productivity"
  • Good: "Our task management feature saves users an average of 2.5 hours per week by automating recurring task creation"

**Well-Structured**

  • Use clear headings
  • Organize by topic
  • Maintain consistent formatting

**Comprehensive**

  • Cover common questions
  • Include edge cases
  • Provide context and nuance

**Current**

  • Remove outdated information
  • Update with recent changes
  • Date-stamp time-sensitive content

Content Types That Work Well

**FAQs and Q&A Pairs**

  • Explicit question-answer format
  • Covers common user queries
  • Easy for AI to match and retrieve

**How-To Guides**

  • Step-by-step instructions
  • Clear procedures
  • Troubleshooting steps

**Reference Documentation**

  • Product specifications
  • Policy documents
  • Technical details

**Case Studies**

  • Real examples
  • Outcomes and lessons
  • Context and nuance

**Frameworks and Methodologies**

  • Your unique approaches
  • Decision-making processes
  • Best practices

Content to Avoid

What Not to Upload

**Duplicate Content**

  • Multiple versions of the same document confuse the AI
  • Keep one authoritative version

**Contradictory Information**

  • Old and new policies together cause confusion
  • Archive outdated content separately

**Sensitive Data**

  • Personal information (unless necessary and compliant)
  • Credentials or passwords
  • Internal-only confidential information

**Raw Data Without Context**

  • Numbers without explanation
  • Lists without descriptions
  • Data that requires interpretation

Organization Best Practices

Naming Conventions

Use descriptive file names:

  • Good: "return-policy-2026.pdf"
  • Bad: "doc1.pdf"

Content Categories

Organize your content into logical groups:

  • Product information
  • Support and troubleshooting
  • Policies and procedures
  • FAQs by topic
  • Case studies and examples

Versioning

When updating content:

  • Replace old files with updated versions
  • Don't keep multiple versions active
  • Note significant changes in your content

Optimizing for Retrieval

Your content gets chunked and embedded for retrieval. Help this process:

Use Clear Headings

The AI uses headings to understand document structure:

```

Product Overview

Features

Feature 1: Task Management

...

```

Include Context

Don't assume knowledge:

  • Bad: "It costs $99"
  • Good: "The Professional plan costs $99/month and includes..."

Be Explicit

When information relates to other topics:

  • Bad: "See above"
  • Good: "As mentioned in the pricing section, the Professional plan..."

Format for Scanning

Use lists and bullet points:

  • Easier to process
  • Better retrieval accuracy
  • More scannable responses

Processing and Limits

How Content Gets Processed

1. **Upload**: You upload files to your knowledge base

2. **Extraction**: Text is extracted from all files

3. **Chunking**: Content is split into semantic chunks

4. **Embedding**: Chunks are converted to vector embeddings

5. **Indexing**: Embeddings are stored for fast retrieval

Current Limits

  • **Storage**: 10MB free, additional charged to wallet
  • **File size**: 10MB per file maximum
  • **File count**: No hard limit (storage-based)
  • **Processing time**: 1-5 minutes depending on content

Storage Management

Monitor your usage in Creator Studio:

  • View total storage used
  • See breakdown by assistant
  • Delete or replace old files as needed

Updating Your Knowledge Base

When to Update

  • Product or service changes
  • New frequently asked questions emerge
  • Policies or procedures change
  • You develop new insights or methodologies

How to Update

1. Navigate to your assistant's Knowledge Base

2. Upload new or replacement files

3. Delete outdated content

4. Reprocessing happens automatically

Testing After Updates

After significant updates:

  • Ask questions about new content
  • Verify old content still works
  • Check for any conflicts or confusion

Troubleshooting

Common Issues

**"My assistant doesn't know about content I uploaded"**

  • Check if processing completed
  • Verify the content is in a supported format
  • Test with exact phrases from your document

**"Responses seem outdated"**

  • Replace old files with current versions
  • Check for duplicate files with old information
  • Ensure the latest upload processed successfully

**"OCR isn't capturing text correctly"**

  • Use higher-resolution images
  • Ensure text is clear and legible
  • Consider retyping critical content

**"File upload fails"**

  • Check file size (under 10MB)
  • Verify file format is supported
  • Try re-saving in a different format

Checklist for Great Knowledge Bases

  • [ ] All files under 10MB
  • [ ] No duplicate content
  • [ ] Information is current
  • [ ] Clear headings and structure
  • [ ] Explicit context provided
  • [ ] No sensitive data included
  • [ ] FAQ format where appropriate
  • [ ] Tested after upload

Your knowledge base is your AI assistant's brain. Invest in quality content, and it will pay dividends in better user experiences and more revenue.

[Upload your content →](/creator/assisters/new)

Enjoyed this article? Share it with others.

Related Posts

View all posts
Guide

Build vs. Buy: Custom AI Assistants in 2026

When to build your own AI assistant vs. using a managed platform. A practical framework for making the right decision.

14 min read
Guide

AI Assistants vs. Chatbots vs. Agents: What's the Difference and Which Do You Need?

Confused by AI terminology? This guide breaks down the differences between chatbots, AI assistants, and AI agents—and helps you choose the right solution for your needs.

14 min read
Guide

Understanding Your Earnings Dashboard: Tracking Revenue, Payouts & Analytics

A complete walkthrough of the Creator Studio earnings dashboard. Learn how to track revenue, understand payouts, and optimize for more earnings.

9 min read
Guide

AI Customer Support That Actually Knows Your Product

Stop frustrating customers with generic chatbots. Learn how to deploy AI support that's trained on YOUR product and actually helps.

10 min read

Turn Your Expertise Into Income

Create AI assistants trained on your knowledge and earn from every conversation. No coding required.

Earn 20% recurring commission

Share Assisters with friends and earn from their subscriptions.

Start Referring