What file formats are supported?

Doc Extract supports 100+ file formats across 18 categories including PDF, Word, Excel, PowerPoint, images (PNG, JPG, TIFF, etc.), audio (MP3, WAV, FLAC, etc.), video (MP4, MKV, MOV, etc.), ebooks (EPUB, MOBI), emails (EML, MSG), and many more. You can also extract text from any URL or webpage.

How does OCR extraction work?

OCR extraction uses Tesseract to recognize text in scanned documents and images. When you upload a PDF with scanned pages or an image file, the system automatically detects embedded images and runs OCR with support for 21+ languages. It works on PDFs, Word documents, Excel files, PowerPoint presentations, and all common image formats.

What audio and video formats can be transcribed?

Doc Extract uses OpenAI Whisper to transcribe audio files (MP3, WAV, FLAC, OGG, Opus, AAC, M4A, WMA, AIFF, AMR, WebM) and video files (MP4, MKV, WebM, AVI, MOV, WMV, MPEG-TS). It supports 100 languages with automatic language detection.

Yes. Doc Extract is fully self-hosted — your files are processed on your own infrastructure and never sent to external services. All processing (text extraction, OCR, transcription) happens locally on your server.

How does caching work?

Every uploaded file is hashed using MD5. If the same file has been processed before, you get instant results from the cache without reprocessing. This dramatically speeds up repeated extractions and reduces server load.

What does the extraction output look like?

The output is a structured JSON response containing the extracted text, metadata (author, title, dates, page count), extracted tables (when applicable), processing time, and file information. The format is consistent across all file types for easy integration.

Can I extract text from URLs and webpages?

Yes. Instead of uploading a file, you can provide any URL. Doc Extract will fetch the webpage content and extract the text, stripping away navigation, ads, and other non-content elements to give you clean, readable text.

How does file validation work?

Every upload goes through three-layer validation: extension checking (is the file extension supported?), MIME type verification (does the content match the claimed type?), and magic bytes analysis (do the actual file bytes match the expected format?). This prevents disguised or corrupted files from being processed.

What AI features are available?

Doc Extract includes AI-powered analysis features: summarize documents, extract named entities, ask questions about content, and translate text into other languages. These features use LLMs to provide intelligent analysis on top of the extracted text.

Is there an API available?

Yes. Doc Extract provides a full REST API with interactive documentation available at /docs (Swagger UI) and /redoc (ReDoc). The API supports synchronous and asynchronous extraction, webhooks for job completion notifications, and all the same features available in the web interface.

Supports 100+ file formats

Extract Text from Any Document, Instantly

Upload files or paste URLs. Get structured text, metadata, and tables from 100+ formats with OCR and audio transcription. Self-hosted and 100% private.

.pdf.docx.xlsx.pptx.epub.mp3.mp4.png.jpg.svg.csv.json.html.md.wav.flac.mkv.ogg.doc.xls.pdf.docx.xlsx.pptx.epub.mp3.mp4.png.jpg.svg.csv.json.html.md.wav.flac.mkv.ogg.doc.xls

.ppt.rtf.odt.eml.txt.xml.yaml.py.ts.go.rs.flv.3gp.mpeg.ac3.ape.caf.m4v.ogv.wv.ppt.rtf.odt.eml.txt.xml.yaml.py.ts.go.rs.flv.3gp.mpeg.ac3.ape.caf.m4v.ogv.wv

Formats
100+: Code Languages
40+: STT Languages
100: Categories
18

Tesseract OCR

OpenAI Whisper

Self-Hosted

100% Private

Powerful Extraction Features

Everything you need to extract, transcribe, and structure content from any file type.

Universal Format Support

Extract text from 100+ file formats across 18 categories — documents, spreadsheets, presentations, emails, ebooks, images, and more.

OCR Extraction

Powered by Tesseract OCR with support for 21+ languages. Extract text from scanned PDFs, photographs, and embedded images automatically.

Audio & Video Transcription

Transcribe audio and video files with OpenAI Whisper. Supports 100 languages with automatic language detection.

Async Job Processing

Submit large files as background jobs and track progress in real-time. Get notified via webhooks when processing completes.

Smart Caching

MD5 deduplication gives instant results for previously processed files.

Three-Layer Validation

Extension, MIME type, and magic bytes verification on every upload.

Table Extraction

Extract structured tables from PDFs, spreadsheets, and presentations.

Metadata Extraction

Get author, title, dates, page count, and format metadata automatically.

Multi-Language

21+ OCR languages and 100 transcription languages supported.

Self-Hosted & Private

Your data never leaves your infrastructure. Fully self-hostable.

URL & Webpage Extraction

Extract text from any URL or webpage — just paste the link.

Production Ready

Rate limiting, circuit breakers, and comprehensive error handling built in.

REST API

Full REST API with interactive OpenAPI docs at /docs and /redoc.

Webhook Callbacks

HMAC-signed webhook notifications when async jobs complete.

AI-Powered Analysis

Summarize, extract entities, ask questions, and translate with LLM.

Text-to-Speech

Browser-based TTS playback with voice selection.

Full-Text Search

PostgreSQL-powered ranked full-text search across extractions.

How It Works

From upload to structured output in six steps.

Upload

File or URL

Validate

Three-layer check

Detect

Auto format routing

Extract

Text from structure

Enhance

OCR, STT, tables, metadata

Deliver

Structured JSON

Supported Formats

100+ file formats across 18 categories. Browse by type below.

PDFOCR

.pdf

Documents

WordOCR

.docx

Documents

Word (Legacy)OCR

.doc

Documents

RTF

.rtf

Documents

ODT

.odt

Documents

ExcelOCR

.xlsx

Spreadsheets

Excel (Legacy)OCR

.xls

Spreadsheets

ODS

.ods

Spreadsheets

PowerPointOCR

.pptx

Presentations

PowerPoint (Legacy)OCR

.ppt

Presentations

ODP

.odp

Presentations

XPS/OXPS

.xps.oxps

Document

Structured Output

Every extraction returns clean, structured JSON — ready for your pipeline.

quarterly-report.pdf

2.4 MB · PDF Document

OCRTablesMetadata

response.json

{
  "filename": "quarterly-report.pdf",
  "text": "Q4 2024 Financial Summary...",
  "metadata": {
    "author": "Finance Team",
    "pages": 24,
    "created": "2024-12-15T09:30:00Z"
  },
  "tables": [
    { "title": "Revenue by Region", "rows": 12 }
  ],
  "processing_ms": 847
}

Frequently Asked Questions

Common questions about Doc Extract.

We'd Love to Hear from You

Whether you have a question, need a hand getting started, or want to explore what's possible — our team is here and happy to help.

Help & Support

Stuck on something? We'll walk you through it. From setup questions to troubleshooting, no question is too small.

AI Development

Building something with AI? Let's talk. We love collaborating on intelligent document workflows and custom extraction pipelines.

API Integration

Need to plug Doc Extract into your stack? We'll help you design a seamless integration that fits your architecture.

We're a small, passionate team that genuinely enjoys helping people build great things. Drop us a line anytime — we read every message and typically reply within a few hours.

support@apidly.com team@apidly.com

Ready to extract text from anything?

100+ formats, OCR, audio transcription, structured output. Self-hosted and private. Get started in seconds.

{ "filename": "quarterly-report.pdf", "text": "Q4 2024 Financial Summary...", "metadata": { "author": "Finance Team", "pages": 24, "created": "2024-12-15T09:30:00Z" }, "tables": [ { "title": "Revenue by Region", "rows": 12 } ], "processing_ms": 847 }

Extract Text from Any Document, Instantly

Powerful Extraction Features

Universal Format Support

OCR Extraction

Audio & Video Transcription

Async Job Processing

Smart Caching

Three-Layer Validation

Table Extraction

Metadata Extraction

Multi-Language

Self-Hosted & Private

URL & Webpage Extraction

Production Ready

REST API

Webhook Callbacks

AI-Powered Analysis

Text-to-Speech

Full-Text Search

How It Works

Upload

Validate

Detect

Extract

Enhance

Deliver

Supported Formats

View complete format reference

Structured Output

Frequently Asked Questions

What file formats are supported?

How does OCR extraction work?

What audio and video formats can be transcribed?

Is my data private?

How does caching work?

What does the extraction output look like?

Can I extract text from URLs and webpages?

How does file validation work?

What AI features are available?

Is there an API available?

We'd Love to Hear from You

Help & Support

AI Development

API Integration

Ready to extract text from anything?

Extract Text from Any Document, Instantly

Powerful Extraction Features

Universal Format Support

OCR Extraction

Audio & Video Transcription

Async Job Processing

Smart Caching

Three-Layer Validation

Table Extraction

Metadata Extraction

Multi-Language

Self-Hosted & Private

URL & Webpage Extraction

Production Ready

REST API

Webhook Callbacks

AI-Powered Analysis

Text-to-Speech

Full-Text Search

How It Works

Upload

Validate

Detect

Extract

Enhance

Deliver

Supported Formats

View complete format reference

Structured Output

Frequently Asked Questions

What file formats are supported?

How does OCR extraction work?

What audio and video formats can be transcribed?

Is my data private?

How does caching work?