What Can You Upload to Train Your AI Assistant? (Complete File Guide)
A comprehensive guide to file formats, best practices, and optimization tips for training your AI assistant's knowledge base.
What Can You Upload to Train Your AI Assistant?
Your AI assistant is only as good as its knowledge base. This guide covers everything you need to know about uploading content—file formats, best practices, and optimization tips.
Supported File Formats
Documents
**PDF Files (.pdf)**
- Standard text PDFs: Fully supported
- Scanned PDFs: Supported via OCR (optical character recognition)
- Image-heavy PDFs: Text extracted, images processed for text
- Max size: 10MB per file
**Word Documents (.doc, .docx)**
- Full formatting preserved for context
- Headers and structure maintained
- Tables converted to readable format
- Max size: 10MB per file
**Text Files (.txt)**
- Plain text, UTF-8 encoding recommended
- Great for FAQs and simple content
- No formatting overhead
- Max size: 10MB per file
**Markdown Files (.md)**
- Structure preserved (headers, lists)
- Code blocks included
- Ideal for technical documentation
- Max size: 10MB per file
Spreadsheets
**CSV Files (.csv)**
- Rows converted to readable entries
- Great for product catalogs, FAQs, data tables
- First row treated as headers
- Max size: 10MB per file
**Excel Files (.xlsx)**
- First sheet processed by default
- Tables and data extracted
- Formulas converted to values
- Max size: 10MB per file
Images
**Image Files (.png, .jpg, .jpeg)**
- OCR extracts visible text
- Great for scanned documents, screenshots, infographics
- Handwritten text partially supported
- Max size: 5MB per file
What Makes Good Training Content?
High-Quality Content Characteristics
**Specific and Detailed**
- Bad: "Our product helps with productivity"
- Good: "Our task management feature saves users an average of 2.5 hours per week by automating recurring task creation"
**Well-Structured**
- Use clear headings
- Organize by topic
- Maintain consistent formatting
**Comprehensive**
- Cover common questions
- Include edge cases
- Provide context and nuance
**Current**
- Remove outdated information
- Update with recent changes
- Date-stamp time-sensitive content
Content Types That Work Well
**FAQs and Q&A Pairs**
- Explicit question-answer format
- Covers common user queries
- Easy for AI to match and retrieve
**How-To Guides**
- Step-by-step instructions
- Clear procedures
- Troubleshooting steps
**Reference Documentation**
- Product specifications
- Policy documents
- Technical details
**Case Studies**
- Real examples
- Outcomes and lessons
- Context and nuance
**Frameworks and Methodologies**
- Your unique approaches
- Decision-making processes
- Best practices
Content to Avoid
What Not to Upload
**Duplicate Content**
- Multiple versions of the same document confuse the AI
- Keep one authoritative version
**Contradictory Information**
- Old and new policies together cause confusion
- Archive outdated content separately
**Sensitive Data**
- Personal information (unless necessary and compliant)
- Credentials or passwords
- Internal-only confidential information
**Raw Data Without Context**
- Numbers without explanation
- Lists without descriptions
- Data that requires interpretation
Organization Best Practices
Naming Conventions
Use descriptive file names:
- Good: "return-policy-2026.pdf"
- Bad: "doc1.pdf"
Content Categories
Organize your content into logical groups:
- Product information
- Support and troubleshooting
- Policies and procedures
- FAQs by topic
- Case studies and examples
Versioning
When updating content:
- Replace old files with updated versions
- Don't keep multiple versions active
- Note significant changes in your content
Optimizing for Retrieval
Your content gets chunked and embedded for retrieval. Help this process:
Use Clear Headings
The AI uses headings to understand document structure:
```
Product Overview
Features
Feature 1: Task Management
...
```
Include Context
Don't assume knowledge:
- Bad: "It costs $99"
- Good: "The Professional plan costs $99/month and includes..."
Be Explicit
When information relates to other topics:
- Bad: "See above"
- Good: "As mentioned in the pricing section, the Professional plan..."
Format for Scanning
Use lists and bullet points:
- Easier to process
- Better retrieval accuracy
- More scannable responses
Processing and Limits
How Content Gets Processed
1. **Upload**: You upload files to your knowledge base
2. **Extraction**: Text is extracted from all files
3. **Chunking**: Content is split into semantic chunks
4. **Embedding**: Chunks are converted to vector embeddings
5. **Indexing**: Embeddings are stored for fast retrieval
Current Limits
- **Storage**: 10MB free, additional charged to wallet
- **File size**: 10MB per file maximum
- **File count**: No hard limit (storage-based)
- **Processing time**: 1-5 minutes depending on content
Storage Management
Monitor your usage in Creator Studio:
- View total storage used
- See breakdown by assistant
- Delete or replace old files as needed
Updating Your Knowledge Base
When to Update
- Product or service changes
- New frequently asked questions emerge
- Policies or procedures change
- You develop new insights or methodologies
How to Update
1. Navigate to your assistant's Knowledge Base
2. Upload new or replacement files
3. Delete outdated content
4. Reprocessing happens automatically
Testing After Updates
After significant updates:
- Ask questions about new content
- Verify old content still works
- Check for any conflicts or confusion
Troubleshooting
Common Issues
**"My assistant doesn't know about content I uploaded"**
- Check if processing completed
- Verify the content is in a supported format
- Test with exact phrases from your document
**"Responses seem outdated"**
- Replace old files with current versions
- Check for duplicate files with old information
- Ensure the latest upload processed successfully
**"OCR isn't capturing text correctly"**
- Use higher-resolution images
- Ensure text is clear and legible
- Consider retyping critical content
**"File upload fails"**
- Check file size (under 10MB)
- Verify file format is supported
- Try re-saving in a different format
Checklist for Great Knowledge Bases
- [ ] All files under 10MB
- [ ] No duplicate content
- [ ] Information is current
- [ ] Clear headings and structure
- [ ] Explicit context provided
- [ ] No sensitive data included
- [ ] FAQ format where appropriate
- [ ] Tested after upload
Your knowledge base is your AI assistant's brain. Invest in quality content, and it will pay dividends in better user experiences and more revenue.
[Upload your content →](/creator/assisters/new)