Parse Document - Unsiloed AI

curl -X 'POST' \
  'https://prod.visionapi.unsiloed.ai/parse' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@document.pdf;type=application/pdf' \
  -F 'use_high_resolution=true' \
  -F 'layout_analysis=smart_layout_detection' \
  -F 'ocr_strategy=auto_detection' \
  -F 'ocr_engine=UnsiloedHawk' \
  -F 'extract_strikethrough=false' \
  -F 'merge_tables=true' \
  -F 'enhance_reading_order=false' \
  -F 'segment_filter=all' \
  -F 'validate_segments=["Table","Picture","Formula"]' \
  -F 'export_format=["docx"]' \
  -F 'segment_analysis={"Table":{"html":"VLM","markdown":"VLM","extended_context":true,"crop_image":"All","model_id":"us_table_v2"}}'

# Alternative: Use presigned URL instead of file upload
# Replace the file parameter with url parameter:
# -F 'url=https://your-bucket.s3.amazonaws.com/document.pdf?signature=...' \

{
  "job_id": "e77a5c42-4dc1-44d0-a30e-ed191e8a8908",
  "status": "Starting",
  "file_name": "document.pdf",
  "created_at": "2025-07-18T10:42:10.545832520Z",
  "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "credit_used": 5,
  "quota_remaining": 23695,
  "merge_tables": false
}

POST

parse

curl -X 'POST' \
  'https://prod.visionapi.unsiloed.ai/parse' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@document.pdf;type=application/pdf' \
  -F 'use_high_resolution=true' \
  -F 'layout_analysis=smart_layout_detection' \
  -F 'ocr_strategy=auto_detection' \
  -F 'ocr_engine=UnsiloedHawk' \
  -F 'extract_strikethrough=false' \
  -F 'merge_tables=true' \
  -F 'enhance_reading_order=false' \
  -F 'segment_filter=all' \
  -F 'validate_segments=["Table","Picture","Formula"]' \
  -F 'export_format=["docx"]' \
  -F 'segment_analysis={"Table":{"html":"VLM","markdown":"VLM","extended_context":true,"crop_image":"All","model_id":"us_table_v2"}}'

# Alternative: Use presigned URL instead of file upload
# Replace the file parameter with url parameter:
# -F 'url=https://your-bucket.s3.amazonaws.com/document.pdf?signature=...' \

{
  "job_id": "e77a5c42-4dc1-44d0-a30e-ed191e8a8908",
  "status": "Starting",
  "file_name": "document.pdf",
  "created_at": "2025-07-18T10:42:10.545832520Z",
  "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "credit_used": 5,
  "quota_remaining": 23695,
  "merge_tables": false
}

Overview

The Parse Document endpoint processes PDFs, images (PNG, JPEG, TIFF), and office files (PPT, DOCX, XLSX) documents and breaks them into meaningful sections with detailed analysis including text extraction, image recognition, table parsing, and OCR data. You can provide documents either by direct file upload or by presigned URL. This endpoint supports advanced customization options for fine-tuning the parsing behavior to match your specific use cases. The parse endpoint uploads your document and configuration in a single request:

POST to /parse with your file and configuration: the API uploads the document and creates a parse job.
The job is automatically enqueued for processing.
Poll GET /parse/{job_id} to track progress and retrieve results.

Processing large files or running many requests in parallel? The Presigned Upload endpoint (POST /v2/parse/upload) decouples document delivery from job creation for faster uploads, larger file sizes, and higher throughput.

Request

You must provide either file or url. If both are provided, file takes precedence.

file

Document file to process. Supported formats: PDF, images (PNG, JPEG, TIFF), and office documents (PPT, PPTX, DOC, DOCX, XLS, XLSX). Required if url is not provided.

url

string

Presigned or public URL of the document to fetch and process. Required if file is not provided.

use_high_resolution

boolean

Use high-resolution images for cropping and post-processing. Improves OCR accuracy on low-quality scans by enhancing clarity and contrast. Latency penalty: ~2–3 seconds per page. Defaults to false.

layout_analysis

string

How the system analyzes and segments document structure.

"smart_layout_detection" (default): Intelligently identifies document structure, headers, sections, and content relationships across the entire document using bounding boxes.
"page_by_page": Analyzes each page independently as a single segment. Faster for simple documents.
"advanced_layout_detection": Uses a vision-language model for exhaustive page segmentation. Detects 14 element types (Caption, Footnote, Formula, ListItem, PageFooter, PageHeader, Picture, SectionHeader, Table, Text, Title, KeyValuePair, Signature, Seal). Best for visually complex or unusual layouts.

ocr_strategy

string

Choose whether OCR runs automatically on detected images or processes all content.

"auto_detection" (default): Intelligently detects bad quality PDFs, scanned documents, and images, then applies OCR only where needed.
"force_ocr": Runs OCR on the entire document regardless of quality.

ocr_engine

string

OCR engine to use for text recognition:

"UnsiloedHawk" (default): Higher accuracy for complex layouts and mixed content. Unrecognized values also fall back to this engine.
"UnsiloedBeta": Handles rotated/warped text and irregular bounding boxes.
"UnsiloedStorm": Enterprise-grade accuracy optimized for 50+ languages.

agentic_ocr

string

Per-segment OCR enhancement: re-runs a dedicated agentic OCR model on each detected segment after layout detection for higher accuracy. Omit or leave empty to disable.

"standard": Good balance of speed and accuracy.
"advanced": Higher quality, best for complex layouts, rotated text, and mixed-language content.

extract_strikethrough

boolean

Detect and preserve strikethrough formatting in HTML and Markdown output. Defaults to false.

merge_tables

boolean

Detect and combine table segments across page breaks, reconstructing complete table structure by matching headers and columns. Defaults to false.

merge_batch_size

integer

Maximum number of tables per merge group when merge_tables is enabled. Groups larger than this are split. Defaults to 20.

enhance_reading_order

boolean

Fix the reading order of detected segments. Defaults to false.

detect_pii

boolean

Run a PII detection pass before parsing. If PII is found at or above pii_block_severity, the task is rejected and no parsing occurs. Defaults to false.

pii_block_severity

string

Severity threshold at which the task is rejected when detect_pii is enabled: any (default) blocks on any PII found; low blocks on quasi-identifiers (names, dates, locations) or higher; medium blocks on contact PII (email, phone) or higher; high blocks only on direct identifiers (SSN, passport, credit card). Ignored if detect_pii is false.

pii_engine

string

PII detection engine: standard (default) or advanced (higher precision, additional processing cost). Ignored if detect_pii is false.

validate_segments

string

JSON array string of segment types to validate and correct using a Vision Language Model, fixing misclassified segments. Example: ["Table", "Formula", "Picture"]. Defaults to ["Table", "Picture"]; an empty or unparseable value also falls back to that default, so Table and Picture validation runs even when this field is omitted.

validate_table_segments

boolean

Legacy parameter that validates table segment classifications using a Vision Language Model. Prefer validate_segments: ["Table"] instead. Defaults to false.

segment_filter

string

Choose which types of content to include in the parsed output. Comma-separated segment types, or "all" to include everything. Defaults to "all".Available segment types:

table: Tabular data segments
picture: Image and graphic segments
formula: Mathematical equations
text: Regular text content
sectionheader: Section headers
title: Document titles
listitem: List items
caption: Image captions
footnote: Footnotes
pageheader: Page headers
pagefooter: Page footers
keyvaluepair: Key-value pairs (advanced layout detection)
signature: Signatures (advanced layout detection)
seal: Seals and stamps (advanced layout detection)
page: Full-page segments

Examples: "table", "table,picture", "table,formula", "picture,formula".

xml_citation

boolean

Extract and hyperlink bibliography citations in the markdown output. PDFs only. Defaults to false.

output_fields

string

JSON object controlling which fields are included in the response. Set fields to false to exclude them and reduce response size. All fields default to true. Ignored when response_profile is slim or full (the profile wins).Available fields:

html: HTML representation of segments
markdown: Markdown representation of segments
ocr: Raw OCR text data with bounding boxes and confidence scores
image: Cropped segment images (base64 encoded)
content: Text content of segments
bbox: Bounding box coordinates
confidence: Confidence scores for segments
embed: Vector embeddings / embed text
chart_data: Extracted chart data for Picture segments identified as charts

Example: {"html": true, "markdown": true, "ocr": false, "image": false}.

response_profile

string

Response shape selector: slim, full, or custom. Omit to return the full shape.

"slim": Returns only the essentials per chunk — embed, bbox, page_number, segment_id, segment_type, and HTML for tables / Markdown for everything else. Drops content, image, ocr, confidence, chart_data, page_height, page_width. Best for embedding-only workflows where you want the smallest payload.
"full": Every field returned (equivalent to omitting this param).
"custom": Honor output_fields verbatim.

When both response_profile and output_fields are provided, the profile wins — output_fields is only consulted for custom or when the profile is omitted. Applies to inline JSON responses only; GET /parse/{job_id}?output_file=true returns a presigned URL to the stored full-shape output file.

segment_analysis

string

JSON object controlling HTML/Markdown generation strategy and AI model per segment type. Configure how different segment types are processed, including table processing models, image description models, and formula processing.Example:

{
  "Table": {"html": "VLM", "markdown": "VLM", "model_id": "us_table_v2"},
  "Picture": {"html": "VLM", "markdown": "VLM", "model_id": "nova"},
  "Formula": {"html": "Auto", "markdown": "VLM", "model_id": "nova"}
}

Options per segment type:

html: "VLM" or "Auto"
markdown: "VLM" or "Auto"
model_id (Table): "astra", "us_table_v1", "us_table_v2"
model_id (Picture/Formula): "nova", "luna", "sol"
use_table_ocr (Table only): Advanced OCR optimized for tabular data. Better handles bordered cells, gridlines, and complex table layouts.
vlm: Custom prompt for the VLM model. Use this to give the model specific instructions for extracting or describing these segment types.
translation: Optional per-segment translation, e.g. {"provider": "Auto", "target_language": "en"}. provider is "Auto" for fast machine translation or "VLM"/"LLM" for model-based translation; target_language is an ISO 639-1 code, or "auto" to auto-detect the source and translate to English. Optional model_id and prompt apply to model-based translation.

segment_processing

string

Alias for segment_analysis. If both are provided, segment_processing takes precedence.

page_range

string

Specify which pages to process. Formats: "1-5", "2,4,6", "[1,3,5]". Defaults to all pages.

segment_type_naming

string

Segment type naming convention. "Unsiloed" (default) uses names like PageHeader, ListItem, Picture. "Other" uses alternative names like Header, List Item, Figure.

extract_colors

boolean

Transfer text color from the PDF text layer to OCR results. Defaults to false.

extract_links

boolean

Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false.

export_format

string

JSON array string of export formats to generate after processing, e.g. ["docx"]. When set, the pipeline generates the requested export files after parsing completes. The exported files are available as presigned URLs in the exports field of the response. Supported values: "docx", "markdown", "json".

This is a multipart form field, so the value must be a JSON-encoded string (["docx"]), not a repeated field. Passing a bare value like docx will fail to parse and silently skip the export.

error_handling

string

Error handling strategy for non-critical processing errors. "Continue" (default) proceeds despite errors (e.g., LLM refusals on individual segments). "Fail" stops and fails the task on any error.

expires_in

integer

Reserved field. Persisted in the task configuration but currently has no effect on retention for POST /parse — the task is not auto-deleted. To get a presigned-upload TTL, use POST /v2/parse/upload instead, where expires_in controls the upload URL’s validity.

chunk_processing

string

JSON object for chunk processing configuration.

llm_processing

string

JSON object for LLM processing configuration.

Configuration Best Practices

Click to expand each scenario below to view detailed configuration settings, recommendations, and trade-offs for your specific use case.

High-Accuracy Processing

Use this configuration when accuracy is critical and processing time is less important.Configuration:

{
  "use_high_resolution": true,
  "layout_analysis": "smart_layout_detection",
  "ocr_strategy": "force_ocr",
  "merge_tables": true,
  "validate_segments": ["Table", "Picture", "Formula"],
  "segment_analysis": {
    "Table": {
      "html": "VLM",
      "markdown": "VLM",
      "extended_context": true,
      "crop_image": "All",
      "model_id": "us_table_v2"
    }
  }
}

When to Use:

Legal documents requiring precise text extraction
Financial statements with complex tables
Archival documents with low-quality scans
Documents where accuracy is more important than speed

Trade-offs:

Latency: +2-3 seconds per page for high resolution
Latency: +1-2 seconds per page for segment validation

Fast Processing

Use this configuration when speed is prioritized over maximum accuracy.Configuration:

{
  "use_high_resolution": false,
  "layout_analysis": "page_by_page",
  "ocr_strategy": "auto_detection",
  "merge_tables": false
}

When to Use:

High-volume document processing
Real-time applications requiring quick results
Documents with simple layouts
Pre-screened high-quality digital documents

Benefits:

Fastest processing time
Lower cost per document
Suitable for batch processing large volumes

Financial Documents (Tables + Charts)

Extract only tables and charts from financial reports and statements.Configuration:

{
  "merge_tables": true,
  "segment_filter": "table,picture",
  "validate_segments": ["Table", "Picture"],
  "layout_analysis": "smart_layout_detection",
  "ocr_strategy": "auto_detection",
  "segment_analysis": {
    "Table": {
      "html": "VLM",
      "markdown": "VLM",
      "model_id": "us_table_v2"
    }
  }
}

When to Use:

Balance sheets and P&L statements
Quarterly/annual financial reports
Investment reports with charts
Documents where only structured data matters

Benefits:

Reduced response size (text content filtered out)
Focus on data-rich content
Merged multi-page tables for complete datasets

Data Extraction Only (Tables)

Extract only tabular data with minimal response size for maximum efficiency.Configuration:

{
  "merge_tables": true,
  "segment_filter": "table",
  "validate_segments": ["Table"],
  "output_fields": {
    "html": true,
    "markdown": true,
    "ocr": false,
    "image": false,
    "content": true,
    "bbox": false,
    "confidence": false
  }
}

When to Use:

Extracting data from invoices
Processing structured forms
Database population from documents
CSV/Excel export workflows

Benefits:

Minimal response payload
Faster data transfer
Easy integration with data pipelines

Academic/Research Documents

Extract content with structured citations from research papers and academic documents.Configuration:

{
  "use_high_resolution": true,
  "layout_analysis": "smart_layout_detection",
  "ocr_strategy": "auto_detection",
  "xml_citation": true
}

When to Use:

Research papers with bibliographies
Academic articles with citations
Scientific documents
Literature reviews

Benefits:

Automatic citation extraction and linking
Structured bibliography metadata
In-text citation hyperlinks in markdown
Preserves academic document structure

Citation extraction is only available for PDF documents.

Scanned Documents

Optimize for scanned documents and images with poor text quality.Configuration:

{
  "use_high_resolution": true,
  "ocr_strategy": "force_ocr",
  "layout_analysis": "smart_layout_detection"
}

When to Use:

Scanned paper documents
Low-quality photocopies
Historical documents
Image-based PDFs

Benefits:

Maximum OCR coverage
Better text extraction from poor quality sources
Higher accuracy for challenging documents

Output Fields Optimization

Optimize response size and performance by selectively including only the fields you need.For Minimal Response Size:

{
  "output_fields": {
    "html": false,
    "markdown": false,
    "ocr": false,
    "image": false,
    "content": true,
    "bbox": false,
    "confidence": false,
    "embed": true
  }
}

For Text-Only Processing:

{
  "output_fields": {
    "html": false,
    "markdown": true,
    "ocr": false,
    "image": false,
    "content": true,
    "bbox": true,
    "confidence": false,
    "embed": true
  }
}

For Full Analysis (Default):Omit output_fields or set all fields to True to include all available data.Benefits:

Reduced response size and bandwidth usage
Faster processing and data transfer
Cost optimization for high-volume processing

Parameter Details

File Input Options

The API supports two methods for providing the document to process:

Direct File Upload (file parameter): Upload the document file directly as multipart/form-data
Presigned URL (url parameter): Provide a publicly accessible URL or presigned URL to the document

Important Notes:

You must provide either file or url, but not both
When using url, the document will be downloaded from the provided URL before processing
Presigned URLs are ideal for documents already stored in cloud storage (S3, GCS, Azure Blob, etc.)
The URL must be publicly accessible or include necessary authentication parameters (e.g., S3 presigned URLs with signatures)
Supported formats are the same for both methods: PDF, images (PNG, JPEG, TIFF), and office documents (PPT, PPTX, DOC, DOCX, XLS, XLSX)

Use Cases for Presigned URLs:

Documents already stored in cloud storage
Avoiding duplicate file uploads
Integration with existing document management systems
Processing large files without upload overhead

Segmentation Method

The layout_analysis parameter controls how the document is analyzed and segmented:

"smart_layout_detection" (default): Analyzes pages for layout elements (e.g., Table, Picture, Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking for complex documents.
"page_by_page": Treats each page as a single segment. Faster processing, ideal for simple documents without complex layouts.
"advanced_layout_detection": Uses a vision-language model to exhaustively segment each page into 14 element types (including KeyValuePair, Signature, and Seal in addition to the standard set). Recommended for documents with dense, non-standard, or visually complex layouts where VGT-based detection misses regions.

Agentic OCR

The agentic_ocr parameter enables per-segment OCR enhancement after layout detection, yielding higher accuracy on small text, stylized fonts, and mathematical formulas. Values:

"standard": Fast, good for most documents.
"advanced": Higher quality, better for complex layouts, rotated or irregular text, and multilingual content.

OCR Mode

The ocr_strategy parameter controls optical character recognition processing:

"auto_detection" (default): Intelligently determines when OCR is needed based on the document content. Balances accuracy and performance.
"force_ocr": Applies OCR to all content regardless of existing text layer. Use this for scanned documents or when maximum text extraction is required.

Table Merging

The merge_tables parameter enables merging of tables that span across multiple pages: How It Works:

Analyzes consecutive table segments across pages
Identifies tables with matching column headers
Merges them into a single unified table structure
Preserves table formatting and data integrity

When to Use:

Multi-Page Financial Statements: Consolidate P&L statements or balance sheets spanning multiple pages
Large Data Tables: Merge inventory lists, transaction records, or data sets split across pages
Reports with Continuation Tables: Automatically combine tables marked with “continued on next page”

Example:

{
  "merge_tables": true
}

Benefits:

Simplified Data Processing: Work with complete tables instead of fragments
Better Context: Maintain full table context for analysis and extraction
Reduced Post-Processing: Eliminates need for manual table stitching

Citation Extraction (Research Papers)

This is a specialized feature for academic and research PDF documents. Not needed for general document parsing.

The xml_citation parameter enables automatic extraction and linking of citations from research papers, academic articles, and scientific documents. How It Works:

Extracts structured bibliography from the document
Identifies in-text citation references (e.g., “Chen et al., 2021”)
Hyperlinks citations in the markdown output to their bibliography entries
Returns structured citation metadata in the response

Example:

{
  "xml_citation": true
}

Response Metadata: When enabled, the response includes a metadata field with structured citation data:

{
  "metadata": {
    "citations": [
      {
        "id": 1,
        "title": "Deep Learning for NLP",
        "authors": ["John Smith", "Jane Doe"],
        "year": "2021",
        "journal": "Nature",
        "volume": "15",
        "pages": "123-145",
        "doi": "10.1000/example"
      }
    ],
    "document_metadata": {
      "title": "Document Title",
      "authors": ["Author Name"]
    }
  }
}

Markdown Enhancement: In-text citations are automatically hyperlinked:

Original: "As shown by Chen et al. (2021)..."
Enhanced: "As shown by [Chen et al. (2021)](#ref-5)..."

Only available for PDF documents. This parameter is ignored for other file types (images, Office documents).

Content Type Filtering

The segment_filter parameter allows you to filter the output to include only specific segment types, reducing response size and focusing on relevant content: How It Works:

Accepts a comma-separated list of segment types (case-insensitive)
Filters segments after processing is complete
Removes chunks that have no segments after filtering

Available Options:

"all" (default): Include all segment types
"table": Only table segments
"picture": Only image/graphic segments
"table,picture": Tables and pictures only
"table,formula": Tables and formulas only
Custom combinations using any segment type

Supported Segment Types:

table, picture, formula, text, sectionheader, title, listitem, caption, footnote, pageheader, pagefooter

Example Usage:

{
  "segment_filter": "table,picture"
}

Use Cases:

Tables Only: Extract only tabular data from financial documents
Pictures Only: Extract charts, graphs, and diagrams for visual analysis
Tables + Pictures: Get structured data and visualizations, skip text content
Custom Combinations: Mix any segment types based on your needs

Benefits:

Reduced Response Size: Filter out unwanted content before receiving results
Faster Processing: Less data to transfer and parse
Focused Extraction: Get only the content types you need
Cost Optimization: Smaller responses reduce bandwidth usage

Output Fields Configuration

The output_fields parameter allows you to control which fields are included in the API response. This is useful for reducing response size, improving performance, and optimizing bandwidth usage when you don’t need all available data. Available Fields:

html (default: true): Include HTML representation of segments
markdown (default: true): Include Markdown representation of segments
ocr (default: true): Include OCR results with bounding boxes and confidence scores
image (default: true): Include cropped segment images (base64 encoded)
content (default: true): Include text content of segments
bbox (default: true): Include bounding box coordinates
confidence (default: true): Include confidence scores for segments
embed (default: true): Include embed text in chunk responses

Usage: Set fields to false to exclude them from the response. Fields not specified default to true for backward compatibility. Example Configuration:

{
  "html": false,
  "markdown": true,
  "ocr": false,
  "image": false,
  "content": true,
  "bbox": true,
  "confidence": false,
  "embed": true
}

Benefits:

Reduced Response Size: Excluding large fields like image and html can significantly reduce payload size
Faster Processing: Less data to serialize and transfer
Cost Optimization: Smaller responses reduce bandwidth costs
Selective Data: Only retrieve the fields you need for your use case

When to Use:

Minimal Response: Set most fields to false when you only need basic content
Text-Only Processing: Exclude image and ocr when processing text content
Embedding Generation: Include only content and embed when generating embeddings
Full Analysis: Keep all fields enabled (default) for comprehensive document analysis

Segment Analysis Configuration

The segment_analysis parameter allows you to customize how different segment types are processed, including HTML/Markdown generation strategies and which field should populate the content field. Available Segment Types: You can configure processing for any of the following segment types:

Table: Tabular data segments
Picture: Image and graphic segments
Formula: Mathematical equations
Title: Document titles
SectionHeader: Section headers
Text: Regular text content
ListItem: List items
Caption: Image captions
Footnote: Footnotes
PageHeader: Page headers
PageFooter: Page footers
Page: Full page segments

Configuration Options: For each segment type, you can specify:

html: Generation strategy for HTML representation
- "Auto" (default): Automatically determine the best method
- "VLM": Use VLM to generate HTML
markdown: Generation strategy for Markdown representation
- "Auto" (default): Automatically determine the best method
- "VLM": Use VLM to generate Markdown
content_source: Defines which field should populate the content field in the response
- "OCR" (default): Use OCR text for content
- "HTML": Use HTML representation as content
- "Markdown": Use Markdown representation as content
- "VLM" (alias "LLM"): Use the VLM-generated representation as content
model_id (Table segments only): Specifies which AI model to use for table processing
- "us_table_v1": Standard table processing model
- "us_table_v2": Enhanced table processing model with improved accuracy
vlm: Custom prompt for the VLM model. Use this to give the model specific instructions for extracting or describing these segment types.
translation: Optional per-segment translation configuration:
- provider: "Auto" (fast machine translation) or "VLM"/"LLM" (model-based translation)
- target_language: ISO 639-1 code (e.g. "en", "es", "fr", "ko"), or "auto" to auto-detect the source language and translate to English
- model_id (optional): model for VLM/LLM translation; defaults to the provider default
- prompt (optional): custom instructions appended to the translation system prompt

Example Configuration:

{
  "Table": {
    "html": "VLM",
    "markdown": "VLM",
    "content_source": "HTML",
    "model_id": "us_table_v2",
    "vlm": "Preserve all merged cells. Use empty strings for missing values."
  },
  "Picture": {
    "html": "VLM",
    "markdown": "VLM",
    "content_source": "Markdown",
    "vlm": "Focus on chart axes, legend labels, and key data trends."
  }
}

How content_source Works: The content_source parameter determines which field’s value will be used to populate the content field in the segment response:

When content_source is set to "HTML", the content field will contain the HTML representation, and the separate html and markdown fields will be empty
When content_source is set to "Markdown", the content field will contain the Markdown representation, and the separate html and markdown fields will be empty
When content_source is set to "OCR" (default), the content field contains OCR text, and html and markdown fields are populated separately

Use Cases:

HTML as Content: Set content_source: "HTML" for Table segments when you want HTML-formatted table data directly in the content field
Markdown as Content: Set content_source: "Markdown" for Picture segments when you want Markdown-formatted descriptions in the content field
VLM-Enhanced Output: Use "VLM" for both html and markdown generation strategies to get AI-enhanced representations in those fields

Response

job_id

string

required

Job identifier. Pass this to GET /parse/{job_id} to poll for results.

status

string

required

Initial job status. Always "Starting" on creation.

file_name

string

required

Name of the uploaded file. For URL submissions this is the last path segment of the URL, or "unknown" when no usable segment exists.

created_at

string

required

ISO 8601 timestamp when the job was created.

message

string

required

Human-readable status message with a polling hint.

credit_used

integer

required

Number of pages deducted from your quota for this job.

quota_remaining

integer

required

Remaining page quota after this job was deducted.

merge_tables

boolean

required

Whether table merging is enabled for this job (reflects the submitted merge_tables value).

Document Analysis Features

The parsing endpoint provides comprehensive document analysis including:

Text Extraction

Extracts text content with high accuracy, preserving formatting and structure.

Image Recognition

Identifies and analyzes images within documents, providing descriptions and metadata.

Table Parsing

Extracts tabular data with proper structure and formatting.

OCR Processing

Performs optical character recognition on text elements with confidence scores.

Section Detection

Automatically identifies different document sections like headers, body text, and captions.

Bounding Box Information

Provides precise coordinates for all extracted elements.

Advanced Content Processing

VLM-Enhanced Analysis: Uses vision-language models for better content understanding
Multi-Format Output: Generates HTML, Markdown, and plain text versions
Context-Aware Processing: Maintains document context across segments
Intelligent Chunking: Creates semantically meaningful document chunks

curl -X 'POST' \
  'https://prod.visionapi.unsiloed.ai/parse' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@document.pdf;type=application/pdf' \
  -F 'use_high_resolution=true' \
  -F 'layout_analysis=smart_layout_detection' \
  -F 'ocr_strategy=auto_detection' \
  -F 'ocr_engine=UnsiloedHawk' \
  -F 'extract_strikethrough=false' \
  -F 'merge_tables=true' \
  -F 'enhance_reading_order=false' \
  -F 'segment_filter=all' \
  -F 'validate_segments=["Table","Picture","Formula"]' \
  -F 'export_format=["docx"]' \
  -F 'segment_analysis={"Table":{"html":"VLM","markdown":"VLM","extended_context":true,"crop_image":"All","model_id":"us_table_v2"}}'

# Alternative: Use presigned URL instead of file upload
# Replace the file parameter with url parameter:
# -F 'url=https://your-bucket.s3.amazonaws.com/document.pdf?signature=...' \

{
  "job_id": "e77a5c42-4dc1-44d0-a30e-ed191e8a8908",
  "status": "Starting",
  "file_name": "document.pdf",
  "created_at": "2025-07-18T10:42:10.545832520Z",
  "message": "Task created successfully. Use GET /parse/{job_id} to check status and retrieve results.",
  "credit_used": 5,
  "quota_remaining": 23695,
  "merge_tables": false
}

Retrieving Results

After the job is created, use the GET /parse/ endpoint to check status and retrieve results:

cURL

curl -X 'GET' \
  'https://prod.visionapi.unsiloed.ai/parse/{job_id}' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key'

Python

import requests
import time

def get_parse_results(job_id, api_key):
    """Monitor job and retrieve results when complete"""
    
    headers = {"api-key": api_key}
    status_url = f"https://prod.visionapi.unsiloed.ai/parse/{job_id}"
    
    # Poll for completion
    while True:
        response = requests.get(status_url, headers=headers)
        
        if response.status_code == 200:
            status_data = response.json()
            print(f"Job Status: {status_data['status']}")
            
            if status_data['status'] == 'Succeeded':
                return status_data  # Results are included in the same response
                    
            elif status_data['status'] == 'Failed':
                raise Exception(f"Job failed: {status_data.get('message', 'Unknown error')}")
                
        time.sleep(5)  # Check every 5 seconds

# Usage
job_id = "e77a5c42-4dc1-44d0-a30e-ed191e8a8908"
results = get_parse_results(job_id, "your-api-key")

Expected Results Structure

When the job completes successfully, the response contains comprehensive document analysis with enhanced processing:

{
  "job_id": "04a7a6d8-5ef7-465a-b22a-8a98e7104dd9",
  "status": "Succeeded",
  "created_at": "2025-10-22T06:51:16.870302Z",
  "started_at": "2025-10-22T06:51:16.966136Z",
  "finished_at": "2025-10-22T06:57:19.821541Z",
  "total_chunks": 25,
  "chunks": [
    {
      "segments": [
        {
          "segment_type": "Title",
          "content": "Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)",
          "image": null,
          "page_number": 1,
          "segment_id": "cc5f8dff-31be-4ccf-885d-4f9062fcee17",
          "confidence": 0.90187776,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "html": "<h1>Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)</h1>",
          "markdown": "# Disinvestment of IFCI's entire stake in Assets Care & Reconstruction Enterprise Ltd (ACRE)",
          "bbox": {
            "left": 72.92226,
            "top": 62.030334,
            "width": 230.36308,
            "height": 55.395317
          },
          "ocr": [
            {
              "bbox": {
                "left": 63.753525,
                "top": 5.395447,
                "width": 164.45312,
                "height": 42.757812
              },
              "text": "Disinvestment",
              "confidence": 0.9999992
            }
          ]
        },
        {
          "segment_type": "Text",
          "content": "Background and context information about the disinvestment process...",
          "image": null,
          "page_number": 1,
          "segment_id": "9d60e48b-77ba-4a23-a0ac-95ee13c615ec",
          "confidence": 0.88558982,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "html": "<p>Background and context information about the disinvestment process...</p>",
          "markdown": "Background and context information about the disinvestment process...",
          "bbox": {
            "left": 486.9685,
            "top": 139.61847,
            "width": 241.29932,
            "height": 48.451706
          },
          "ocr": [
            {
              "bbox": {
                "left": 50.9729,
                "top": 3.4557495,
                "width": 46.046875,
                "height": 19.734375
              },
              "text": "Background",
              "confidence": 0.99999654
            }
          ]
        }
      ]
    }
  ]
}

Segment Types

The parsing API identifies and processes different types of document segments with enhanced processing:

Picture

Images and graphics within the document, including logos, charts, and illustrations. Enhanced with VLM-based description generation.

SectionHeader

Document headers and titles that define section boundaries. Processed with semantic understanding.

Text

Regular text content including paragraphs, sentences, and individual text elements. Enhanced with context-aware processing.

Table

Tabular data with structured rows and columns. Enhanced with VLM-based formatting and extended context options. You can configure the table processing model using model_id in the segment_analysis parameter:

us_table_v1: Standard table processing model
us_table_v2: Enhanced table processing model with improved accuracy

Caption

Text captions associated with images or figures. Processed with relationship awareness.

Formula

Mathematical equations and expressions. Enhanced with specialized formula processing.

Title

Document titles and main headings. Processed with enhanced formatting.

Footnote

Document footnotes and references. Processed with context linking.

ListItem

Bulleted and numbered list items. Processed with structure preservation. Each segment includes detailed metadata such as confidence scores, bounding boxes, OCR data, and formatted output in both HTML and Markdown with VLM enhancement.

Error Handling

Common Error Scenarios

Invalid API Key: Authentication failed
File Too Large: File exceeds size limits
Invalid Configuration: Malformed processing parameters
Server Error: Internal processing error
Processing Timeout: Task took too long to complete
Missing File or URL: Neither file nor url parameter provided
Both File and URL Provided: Cannot provide both file and url simultaneously
Invalid URL: URL is not accessible or malformed
URL Download Failed: Unable to download document from provided URL
Insufficient Quota (402): Not enough page credits remaining.
Usage Limit Exceeded (429): Billing usage cap reached. Returns plain text: Usage limit exceeded. No Retry-After header.
Rate Limit Exceeded (429): Org exceeded its per-second request budget (default 10 requests per second, configurable per organization). Returns JSON {"error": "rate_limit_exceeded", "message": ..., "retry_after": 1} with a Retry-After: 1 header.
Internal Server Error (500): An unexpected error occurred during processing.
Service Unavailable (503): Job queue is at capacity. Retry after the duration indicated in the Retry-After header.
Forbidden (403): Access has been revoked.

Authorizations

Authorization

string

header

required

API key for authentication. Use 'Bearer <your_api_key>'

Body

Provide either file (binary upload, multipart only) or url (presigned/public URL, both content types), not both. JSON callers send all fields as native JSON values; multipart callers send each field as a form part. The file field is multipart-only.

Request body for POST /parse (multipart/form-data).

Provide either file (binary upload) or url (presigned/public URL) — not both.

file

required

Document file to process. Required if url is not provided. Supported formats: PDF, PNG, JPEG, TIFF, PPT, PPTX, DOC, DOCX, XLS, XLSX.

agentic_ocr

string | null

Enable per-segment agentic OCR for higher accuracy. Pass "standard" or "advanced".

chunk_processing

string | null

JSON object for chunk processing configuration.

detect_pii

boolean | null

default:false

Run a PII pre-check before parsing. When enabled, the document is scanned for personally identifiable information before any extraction work happens. If PII is found at or above pii_block_severity, the task is rejected and no parsing occurs (the job ends in a failed state with a PII reason). Defaults to false.

enhance_reading_order

boolean | null

default:false

Fix the reading order of detected segments. Defaults to false.

error_handling

string | null

default:Continue

Error handling strategy for non-critical processing errors. Continue (default) — proceed despite errors (e.g., LLM refusals). Fail — stop and fail the task on any error.

expires_in

integer<int32> | null

Reserved field. Persisted in the task configuration but currently has no effect on retention for this endpoint — POST /parse (multipart and JSON/Form) does not set the task's expires_at column, and the cleanup job only deletes AwaitingUpload rows past their expires_at. To get a presigned-upload TTL, use POST /v2/parse/upload instead, where expires_in controls the upload URL's validity.

export_format

enum<string>[] | null

Export format(s) to generate after processing. When set, the pipeline generates the requested export files after parsing completes. The exported files are available as presigned URLs in the exports field of the response. Supported: ["docx", "markdown", "json"].

File format for exporting parsed results. When specified in a parse request, the pipeline generates the requested export file after processing completes. The exported file is available via the exports field in the task response.

Available options:

docx,

markdown,

json

Example:

["docx", "markdown", "json"]

extract_colors

boolean | null

default:false

Transfer text color from the PDF text layer to OCR results. Defaults to false.

extract_links

boolean | null

default:false

Attach hyperlink URLs from PDF annotations to OCR results. Defaults to false.

extract_strikethrough

boolean | null

default:false

Preserve strikethrough formatting in HTML/Markdown output. Defaults to false.

layout_analysis

string | null

default:smart_layout_detection

Layout analysis strategy. smart_layout_detection (default) — detects layout elements using bounding boxes. page_by_page — treats each page as a single segment; faster for simple documents. advanced_layout_detection — higher-accuracy layout detection for complex pages (multi-column layouts, dense tables/figures); slower than smart_layout_detection.

llm_processing

string | null

JSON object for LLM processing configuration.

merge_batch_size

integer<int32> | null

default:20

Maximum number of tables per merge group when merge_tables is enabled. Groups larger than this are split into separate merges. Defaults to 20.

merge_tables

boolean | null

default:false

Merge tables that span multiple pages into a single unified structure. Defaults to false.

ocr_engine

string | null

default:UnsiloedBeta

OCR engine to use for text recognition. UnsiloedBeta (default) — handles irregular bounding boxes, rotated/warped text. UnsiloedHawk — higher accuracy, better for complex layouts. UnsiloedStorm — enterprise-grade accuracy, optimized for 50+ languages.

ocr_strategy

string | null

default:auto_detection

OCR strategy. auto_detection (default) — applies OCR only where needed. force_ocr — applies OCR to all content regardless of existing text layer.

output_fields

string | null

JSON object filtering which fields appear on each segment / chunk. Each key defaults to true; set a key to false to drop the field. Keys: bbox, chart_data, confidence, content, embed, html, image, markdown, ocr. Example: {"html": false, "ocr": false}. Ignored when response_profile is slim or full.

page_range

string | null

Page range to process. Formats: "1-5", "2,4,6", "[1,3,5]". Defaults to all pages.

pii_block_severity

string | null

default:any

Severity threshold at which a detected PII finding blocks the task. Ignored when detect_pii is false. Findings strictly below the threshold are allowed through; findings at or above it reject the task.

any (default) — block on any detection, regardless of severity.
low — block on low, medium, or high severity findings.
medium — block on medium or high severity findings.
high — block only on high severity findings.

pii_engine

string | null

default:standard

PII detector engine to use when detect_pii is true. Ignored otherwise.

standard (default) — fast pattern-based detector; low latency, well-suited to bulk pre-screening.
advanced — model-based detector; slower but catches contextual cases that pattern matching misses (e.g. handwritten names, partially redacted IDs, document-style references to a person).

response_profile

string | null

Response shape selector: slim, full, or custom.

slim: chunk embed + bbox + page_number + segment_id + segment_type + HTML for tables / Markdown for everything else. Drops content, image, ocr, confidence, chart_data, page_height, page_width.
full: every field returned (equivalent to omitting this param).
custom: honor output_fields verbatim.

Precedence: when both response_profile and output_fields are provided, the profile wins (output_fields only matters for custom or when the profile is omitted).

Applies to inline JSON responses only — GET /parse/{job_id}?output_file=true returns a presigned URL to the stored full-shape output file.

Example:

"slim"

segment_analysis

string | null

JSON object controlling HTML/Markdown generation strategy and AI model per segment type. Example: {"Table": {"html": "LLM", "markdown": "LLM", "model_id": "us_table_v2"}}.

segment_filter

string | null

default:all

Content filter: comma-separated segment types to keep. Example: "table,picture". Use "all" to include everything. Defaults to "all".

segment_processing

string | null

Alias for segment_analysis (Core Parser name). If both are provided, this takes precedence.

segment_type_naming

string | null

default:Unsiloed

Segment type naming convention. Unsiloed (default) — e.g., PageHeader, ListItem, Picture. Other — alternative names e.g., Header, List Item, Figure.

url

string | null

Presigned or public URL of the document to fetch and process. Required if file is not provided.

use_high_resolution

boolean | null

default:true

Use high-resolution images for cropping and post-processing. Latency penalty: ~2–3 s per page. Defaults to true.

validate_segments

string | null

JSON array string of segment types to validate with VLM. Example: ["Table", "Formula", "Picture"]. Defaults to [].

validate_table_segments

boolean | null

default:false

Legacy: validate table segment classifications using VLM. Prefer validate_segments: ["Table"] instead. Defaults to false.

xml_citation

boolean | null

default:false

Extract and hyperlink bibliography citations in the markdown output. PDFs only. Defaults to false.

Response

Job created — poll with GET /parse/{job_id} to retrieve results.

Response body for a successful POST /parse call.

created_at

string

required

ISO 8601 timestamp when the job was created.

credit_used

integer<int32>

required

Number of pages deducted from your quota for this job.

file_name

string

required

Name of the uploaded file or "unknown" when a URL was provided.

job_id

string

required

Job identifier — pass this to GET /parse/{job_id} to poll for results.

merge_tables

boolean

required

Whether table merging is enabled for this job (reflects the submitted merge_tables value).

message

string

required

Human-readable status message with a polling hint.

quota_remaining

integer<int64>

required

Remaining page quota after this job was deducted.

status

string

required

Initial job status. Always "Starting" on creation.

Parse Excel

​Overview

​Request

​Configuration Best Practices

​Parameter Details

​File Input Options

​Segmentation Method

​Agentic OCR

​OCR Mode

​Table Merging

​Citation Extraction (Research Papers)

​Content Type Filtering

​Output Fields Configuration

​Segment Analysis Configuration

​Response

​Document Analysis Features

​Text Extraction

​Image Recognition

​Table Parsing

​OCR Processing

​Section Detection

​Bounding Box Information

​Advanced Content Processing

​Retrieving Results

​Expected Results Structure

​Segment Types

​Picture

​SectionHeader

​Text

​Table

​Caption

​Formula

​Title

​Footnote

​ListItem

​Error Handling

​Common Error Scenarios

Authorizations

Body

Response

Overview

Request

Configuration Best Practices

Parameter Details

File Input Options

Segmentation Method

Agentic OCR

OCR Mode

Table Merging

Citation Extraction (Research Papers)

Content Type Filtering

Output Fields Configuration

Segment Analysis Configuration

Response

Document Analysis Features

Text Extraction

Image Recognition

Table Parsing

OCR Processing

Section Detection

Bounding Box Information

Advanced Content Processing

Retrieving Results

Expected Results Structure

Segment Types

Picture

SectionHeader

Text

Table

Caption

Formula

Title

Footnote

ListItem

Error Handling

Common Error Scenarios