Skip to main content
A successful /parse job returns the document organized into chunks. Each chunk has an embed Markdown string (concatenated content from its segments, ready for embedding) and an array of segments with bounding boxes and metadata. The example below is a real response from a single-page test document.
{
  "job_id": "a0f51f79-6eb8-412a-9afa-924ddfbf9578",
  "status": "Succeeded",
  "message": "Task succeeded",
  "file_name": "document.pdf",
  "file_type": "application/pdf",
  "page_count": 1,
  "total_chunks": 1,
  "credit_used": 1,
  "merge_tables": false,
  "created_at": "2026-05-22T11:29:17.964433Z",
  "started_at": "2026-05-22T11:29:18.040527Z",
  "finished_at": "2026-05-22T11:29:32.109418Z",
  "pdf_url": "https://s3.us-east-1.amazonaws.com/...",
  "file_url": "https://s3.us-east-1.amazonaws.com/...",
  "configuration": {
    "layout_analysis": "smart_layout_detection",
    "ocr_engine": "UnsiloedHawk",
    "ocr_strategy": "auto_detection",
    "merge_tables": false,
    "...": "..."
  },
  "metadata": {},
  "chunks": [
    {
      "chunk_id": "6b2eca3a-d14f-4164-ba9a-0a3a58fcaf45",
      "chunk_length": 117,
      "embed": "## Q1 2024 Sales Report\nThe following table summarises regional sales performance...",
      "segments": [
        {
          "segment_id": "034a37e7-6e4b-45dd-802c-e648d6c16498",
          "segment_type": "SectionHeader",
          "content": "Q1 2024 Sales Report",
          "markdown": "## Q1 2024 Sales Report",
          "html": "<h2>Q1 2024 Sales Report</h2>",
          "bbox": { "left": 427.6, "top": 67.8, "width": 344.7, "height": 36.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.35,
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "ocr": [
            { "text": "Q1", "bbox": { "left": 5.4, "top": 4.1, "width": 35.6, "height": 22.7 }, "confidence": null },
            { "text": "2024", "bbox": { "left": 56.4, "top": 4.1, "width": 69.4, "height": 22.7 }, "confidence": null }
          ],
          "references": null
        },
        {
          "segment_id": "4f4b54bc-793e-49cc-b0a3-113bbb5484be",
          "segment_type": "Table",
          "markdown": "| Region | Sales Rep | Units Sold | Revenue ($) |\n| --- | --- | --- | --- |\n| North | Alice Brown | 1,240 | 186,000 |\n| ... | ... | ... | ... |",
          "html": "<table>...</table>",
          "image": "https://s3.us-east-1.amazonaws.com/...",
          "bbox": { "left": 54.4, "top": 208.5, "width": 1026.5, "height": 246.5 },
          "page_number": 1,
          "page_width": 1191.0,
          "page_height": 1684.0,
          "confidence": 0.99,
          "ocr": [],
          "references": null
        }
      ]
    }
  ]
}

Top-Level Fields

These fall into three groups: identification, status, and timing; parsed content; and job configuration and metering.

Identification, Status, and Timing

  • job_id: unique identifier for the parsing job
  • status: job state (Succeeded, Failed, or an in-progress value such as Starting or Processing)
  • message: human-readable status message ("Task succeeded" when the job completes)
  • file_name: name of the uploaded file
  • file_type: MIME type of the uploaded file (e.g., application/pdf)
  • created_at: ISO 8601 timestamp when the job was created
  • started_at: ISO 8601 timestamp when processing began
  • finished_at: ISO 8601 timestamp when processing completed

Parsed Content

  • chunks: array of content chunks
  • total_chunks: total number of chunks
  • page_count: total number of pages in the document
  • pdf_url: temporary signed S3 URL to the processed PDF, or null unless include_url=true (see URL fields below)
  • file_url: temporary signed S3 URL to the original uploaded file, or null unless include_url=true

Job Configuration and Metering

  • configuration: the full configuration object used for this parse (OCR engine, layout strategy, segment processing settings, etc.); see the Parse API reference for every option
  • metadata: additional job metadata; usually an empty object
  • merge_tables: whether tables were merged across pages
  • credit_used: credits consumed by this job

Chunk Fields

  • chunk_id: unique identifier for the chunk
  • chunk_length: character length of the chunk’s embed content
  • embed: combined Markdown content from all segments in the chunk, ready for embedding into a vector store
  • segments: array of layout segments within the chunk

Segment Fields

  • segment_id: unique identifier for the segment
  • segment_type: element classification; see the Element Types reference for the full list
  • content: plain-text content of the segment (omitted for Signature segments)
  • markdown: Markdown-formatted content
  • html: HTML-formatted content
  • image: signed S3 URL to a cropped image of the segment (present for most types; omitted for Signature), or null unless include_url=true (see URL fields below)
  • bbox: bounding box relative to the page, with left, top, width, height in render pixels
  • page_number: page where the segment appears
  • page_width / page_height: dimensions in pixels of the rendered page the bounding boxes are measured against; use the ratio of page_width to the page’s width in PDF points to convert coordinates back to points
  • confidence: model confidence score (0–1) for element detection
  • ocr: array of word-level OCR results
  • references: references to related segments; typically null

OCR Item Fields

Each item in a segment’s ocr array describes one word the OCR engine recognized within that segment. The bounding box is relative to the segment’s cropped image, not the full page.
  • text: the recognized word or token
  • bbox: bounding box relative to the segment’s image, with left, top, width, height
  • confidence: per-word model confidence (0–1), or null when not reported
  • color: optional r, g, b, and hex sub-fields, present only when extract_colors: true is set in the parse configuration

URL Fields

By default, every file URL in the response is returned as null so the response (and any log that captures it) never exposes your storage bucket, region, or path. The gated fields are:
  • pdf_url
  • file_url
  • output_file_url
  • exports (presigned export download URLs)
  • segment image (cropped segment images)
  • configuration.input_file_url
To receive the real URLs, opt in when polling for results with either the include_url=true query parameter or the include-url: true header on GET /parse/{job_id}:
curl "https://prod.visionapi.unsiloed.ai/parse/$JOB_ID?include_url=true" \
  -H "api-key: $UNSILOED_API_KEY"
include_url does not rewrite or re-sign URLs; when set to true they are returned exactly as generated. Presigned URLs are time-limited, so fetch any files you need promptly.