One API for 60+ File Formats to LLM-Ready Data

Any File In.
LLM-Ready Out.

Q: What file formats does Exoa support?

Exoa supports 60+ file formats including PDF, DOCX, XLSX, PPTX, HTML, CSV, Markdown, images (PNG, JPG, TIFF with OCR), audio files (MP3, WAV, FLAC via speech-to-text), emails (EML, MSG), EPUB, and many legacy formats like DOC, XLS, PPT, and RTF.

Q: How does per-page pricing work?

Every account gets 15,000 pages free. After that, you pay $0.0015 per page with no monthly commitment. A "page" is one page of a document, one image, one spreadsheet sheet, or one email. Audio files are billed as 1 page per file. You can track your usage in real time from the billing dashboard.

Q: How does async processing work?

Upload a file via the API and you get back a job ID immediately. Poll the status endpoint until it returns "completed", then fetch the structured result. This lets you process large files or batches of up to 10 files without blocking your application. The dashboard also shows real-time progress for each file.

Extract, chunk, and structure content from PDFs, Office docs, images, audio, and 60+ formats — ready for RAG, fine-tuning, or any LLM workflow.

Try Exoa Free View API Docs

Trusted by top companies powering mission-critical AI workflows.

rebase

Ford

LEXICON

_zapier

Ogilvy

[01] How it works

Three steps to structured output.

Upload any file, Exoa extracts and chunks the content, and you get clean JSON or Markdown back.

DocumentsPDF, DOCX, XLSX, PPTX

Legacy OfficeDOC, XLS, PPT, RTF, ODT

Images (OCR)PNG, JPG, TIFF, HEIC

AudioMP3, WAV, FLAC, OGG

Markup & DataHTML, MD, CSV, JSON, XML

1. Upload Any File

Send any file via our REST API or drag-and-drop in the dashboard. Exoa handles 60+ formats including PDFs, Office documents, images with OCR, audio transcription, and more.

Processing PipelinePipeline

1Extract✓

2Chunk✓

3Count Tokens✓

4Describe Images✓

5Structure✓

2. Extract, Chunk, and Structure

Exoa extracts text, tables, and images, then chunks content using semantic or fixed-size strategies with accurate GPT-4 token counts for each chunk.

Ready for

LLM-Ready JSON

Formatted Markdown

Chunked with Tokens

Table Data Preserved

Image Descriptions

3. Get Structured Output

Receive clean JSON or Markdown with every chunk, token count, page reference, and element type — ready to feed directly into your RAG pipeline or LLM context window.

[02] Platform

Platform Highlights.

Everything you need to turn raw files into structured, LLM-ready data with a single API call.

Exoa document intelligence platform dashboard showing file upload, extraction results, and structured output

60+ File Formats

One API handles PDFs, Office docs, images, audio, email, ebooks, markup, and legacy formats. No more juggling different libraries for each file type.

Smart Chunking

Split documents by structure with semantic chunking or by size with fixed chunking. Each chunk includes accurate GPT-4 token counts and page references.

Built-in OCR

Automatically extract text from scanned documents and images using OCR. Supports multi-language recognition with no extra configuration needed.

API + Dashboard

Use the REST API for automation or the web dashboard for drag-and-drop uploads. Both return the same structured output in JSON or Markdown.

Async Batch ProcessingNew

Queue single files or batches of up to 10 files for background processing. Poll for status or get results when ready — no timeouts on large documents.

Table & Image Extraction

Tables are preserved as structured data with HTML representation. Images get AI-generated descriptions via BLIP, with optional base64 inclusion.

[03] Security

Privacy-First Processing.

Your files are processed locally on our infrastructure — no data is sent to third-party APIs. Secure authentication and encrypted transport keep your content safe.

ProcessingLocal

No Third-Party APIs

All extraction runs locally

On-Premise Option

Self-host for full control

Local Processing

Your files never leave our infrastructure. OCR, transcription, and extraction all happen locally — no data is sent to external services.

AuthenticationActive

API Key Auth

Scoped keys with rate limits

Session Auth

Secure httponly cookies

Secure Authentication

API keys with per-minute and per-day rate limits for programmatic access. Session-based auth with httponly cookies for the dashboard.

TransportAlways-on

HTTPS Everywhere

All traffic encrypted in transit

Request Logging

Monitor API usage and errors

Encrypted Transport

All API traffic is served over HTTPS. API call logging tracks usage, response times, and errors for monitoring.

[Why It Matters]

Your data is an asset.
Treat it like one.

The knowledge your AI needs is trapped inside files — PDFs, spreadsheets, scanned documents, audio recordings. Getting it out means writing custom parsers for every format.

Exoa handles the extraction so you can focus on building.

LLM-Ready Output

Get structured JSON with chunks, token counts, and metadata — ready to drop into any LLM context window.

60+ Formats, One API

Stop writing custom parsers for each file type. One endpoint handles PDFs, Office docs, images, audio, and more.

Zero Manual Pipelines

Replace brittle extraction scripts with a single API call that handles OCR, table parsing, and chunking automatically.

Accurate Token Counting

Every chunk includes GPT-4 compatible token counts so you can manage context windows and estimate costs upfront.

Exoa replaces your file parsing stack with a single API so you can ship AI features, not maintain extraction pipelines.

[04] Use cases

Built for AI developers.

Common workflows where Exoa saves you from writing custom extraction code.

RAG Pipelines

Get pre-chunked, tokenized content with page references and element types — ready to load directly into your vector store for retrieval-augmented generation.

Document Ingestion for LLMs

Convert contracts, reports, manuals, and filings into structured JSON that fits cleanly into LLM context windows with accurate token counts.

Data Extraction & Parsing

Extract tables, text, and metadata from PDFs, scanned documents, and Office files. Get structured data without writing format-specific parsers.

Audio & Image Processing

Transcribe audio files via Whisper and extract text from images via OCR — all through the same API, with the same structured output format.

[05] Supported formats

60+ File Formats.

One API endpoint handles all of these. Upload any supported file and get structured output back.

Documents & Office

PDFDOCXDOCXLSXXLSPPTXPPTRTFODTODSODP

Images & Audio

PNGJPGTIFFBMPWebPGIFHEICMP3WAVM4AFLACOGG

Markup & Data

HTMLMarkdownCSVTSVJSONXMLRSTAsciiDocOrg

Email & Other

EMLMSGEPUBDBFDIFP7STXT

[06] Pricing

Simple, transparent pricing.

Pay only for what you use. First 15,000 pages free.

Pay As You Go

$0.0015/page

•First 15,000 pages free
•Prepaid credits, no subscription
•All file formats supported
•Full API access

Get Started Free

Enterprise

Custom

•Volume discounts
•Dedicated support & SLAs
•Custom integrations
•On-premise deployment options

Frequently Asked Questions

What file formats does Exoa support?

How does per-page pricing work?

How does async processing work?

[07] Start

Get Started With Exoa.

Start converting files to structured, LLM-ready data in minutes. 15,000 pages free, no credit card required.

Try It Free View API Docs

Any File In.LLM-Ready Out.