Response Format - Unsiloed AI

A successful /parse job returns the document organized into chunks. Each chunk has an embed Markdown string (concatenated content from its segments, ready for embedding) and an array of segments with bounding boxes and metadata. The example below is a real response from a single-page test document.

{
  "job_id": "a0f51f79-6eb8-412a-9afa-924ddfbf9578",
  "status": "Succeeded",
  "message": "Task succeeded",
  "file_name": "document.pdf",
  "file_type": "application/pdf",
  "page_count": 1,
  "total_chunks": 1,
  "credit_used": 1,
  "merge_tables": false,
  "created_at": "2026-05-22T11:29:17.964433Z",
  "started_at": "2026-05-22T11:29:18.040527Z",
  "finished_at": "2026-05-22T11:29:32.109418Z",
  "pdf_url": "https://s3.us-east-1.amazonaws.com/...",
  "file_url": "https://s3.us-east-1.amazonaws.com/...",
  "configuration": {
    "layout_analysis": "smart_layout_detection",
    "ocr_engine": "UnsiloedHawk",
    "ocr_strategy": "auto_detection",
    "merge_tables": false,
    "...": "..."
  },
  "metadata": {},
  "chunks": [
    {
      "chunk_id": "6b2eca3a-d14f-4164-ba9a-0a3a58fcaf45",
      "chunk_length": 117,
      "embed": "## Q1 2024 Sales Report\nThe following table summarises regional sales performance...",
      "segments": [
        {
          "segment_id": "034a37e7-6e4b-45dd-802c-e648d6c16498",
          "segment_type": "SectionHeader",
          "content": "Q1 2024 Sales Report",
          "markdown": "## Q1 2024 Sales Report",
          "html": "<h2>Q1 2024 Sales Report</h2>",
          "bbox": { "left": 427.6, "top": 67.8, "width": 344.7, "height": 36.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.35,
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "ocr": [
            { "text": "Q1", "bbox": { "left": 5.4, "top": 4.1, "width": 35.6, "height": 22.7 }, "confidence": null },
            { "text": "2024", "bbox": { "left": 56.4, "top": 4.1, "width": 69.4, "height": 22.7 }, "confidence": null }
          ],
          "references": null
        },
        {
          "segment_id": "4f4b54bc-793e-49cc-b0a3-113bbb5484be",
          "segment_type": "Table",
          "markdown": "| Region | Sales Rep | Units Sold | Revenue ($) |\n| --- | --- | --- | --- |\n| North | Alice Brown | 1,240 | 186,000 |\n| ... | ... | ... | ... |",
          "html": "<table>...</table>",
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "bbox": { "left": 54.4, "top": 208.5, "width": 1026.5, "height": 246.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.99,
          "ocr": [],
          "references": null
        }
      ]
    }
  ]
}

Top-Level Fields

These fall into three groups: identification, status, and timing; parsed content; and job configuration and metering.

Identification, Status, and Timing

job_id: unique identifier for the parsing job
status: job state (Succeeded, Failed, or an in-progress value such as Starting or Processing)
message: human-readable status message ("Task succeeded" when the job completes)
file_name: name of the uploaded file
file_type: MIME type of the uploaded file (e.g., application/pdf)
created_at: ISO 8601 timestamp when the job was created
started_at: ISO 8601 timestamp when processing began
finished_at: ISO 8601 timestamp when processing completed

Parsed Content

chunks: array of content chunks
total_chunks: total number of chunks
page_count: total number of pages in the document
pdf_url: temporary signed S3 URL to the processed PDF, or null unless include_url=true (see URL fields below)
file_url: temporary signed S3 URL to the original uploaded file, or null unless include_url=true

Job Configuration and Metering

configuration: the full configuration object used for this parse (OCR engine, layout strategy, segment processing settings, etc.); see the Parse API reference for every option
metadata: additional job metadata; usually an empty object
merge_tables: whether tables were merged across pages
credit_used: credits consumed by this job

Chunk Fields

chunk_id: unique identifier for the chunk
chunk_length: character length of the chunk’s embed content
embed: combined Markdown content from all segments in the chunk, ready for embedding into a vector store
segments: array of layout segments within the chunk

Segment Fields

segment_id: unique identifier for the segment
segment_type: element classification; see the Element Types reference for the full list
content: plain-text content of the segment (omitted for Signature segments)
markdown: Markdown-formatted content
html: HTML-formatted content
image: signed S3 URL to a cropped image of the segment (present for most types; omitted for Signature), or null unless include_url=true (see URL fields below)
bbox: bounding box relative to the page, with left, top, width, height in render pixels
page_number: page where the segment appears
page_width / page_height: dimensions in pixels of the rendered page the bounding boxes are measured against; use the ratio of page_width to the page’s width in PDF points to convert coordinates back to points
confidence: model confidence score (0–1) for element detection
ocr: array of word-level OCR results
references: references to related segments; typically null

OCR Item Fields

Each item in a segment’s ocr array describes one word the OCR engine recognized within that segment. The bounding box is relative to the segment’s cropped image, not the full page.

text: the recognized word or token
bbox: bounding box relative to the segment’s image, with left, top, width, height
confidence: per-word model confidence (0–1), or null when not reported
color: optional r, g, b, and hex sub-fields, present only when extract_colors: true is set in the parse configuration

URL Fields

By default, every file URL in the response is returned as null so the response (and any log that captures it) never exposes your storage bucket, region, or path. The gated fields are:

pdf_url
file_url
output_file_url
exports (presigned export download URLs)
segment image (cropped segment images)
configuration.input_file_url

To receive the real URLs, opt in when polling for results with either the include_url=true query parameter or the include-url: true header on GET /parse/{job_id}:

curl "https://prod.visionapi.unsiloed.ai/parse/$JOB_ID?include_url=true" \
  -H "api-key: $UNSILOED_API_KEY"

include_url does not rewrite or re-sign URLs; when set to true they are returned exactly as generated. Presigned URLs are time-limited, so fetch any files you need promptly.

​Top-Level Fields

​Identification, Status, and Timing

​Parsed Content

​Job Configuration and Metering

​Chunk Fields

​Segment Fields

​OCR Item Fields

​URL Fields

Top-Level Fields

Identification, Status, and Timing

Parsed Content

Job Configuration and Metering

Chunk Fields

Segment Fields

OCR Item Fields

URL Fields