Get Parse Result - Unsiloed AI

Overview

The Get Parse Job Status endpoint allows you to check the current status of parsing jobs and retrieve the complete results when processing is complete. This endpoint is specifically designed for the parsing API and returns comprehensive document analysis including text extraction, image recognition, table parsing, and OCR data.

Parsing jobs are processed asynchronously. Use this endpoint to poll for completion and retrieve results when the job status is “Succeeded”.

Parameters

job_id

string

required

Job ID returned by POST /parse.

base64_urls

boolean

Return segment images as base64-encoded data URIs instead of S3 presigned URLs. Defaults to false.

include_chunks

boolean

Include the chunks array in the response. Defaults to true.

output_file

boolean

Return a presigned S3 URL to the raw output JSON file instead of inlining the full response body. Defaults to false.

include_url

boolean

Opt in to receiving file URLs (pdf_url, file_url, output_file_url, segment image, configuration.input_file_url) in the response. Defaults to false, in which case these fields are returned as null so the response (and any log of it) does not expose the storage bucket, region, or path. exports URLs are always returned regardless of this setting. Can also be set via the include-url header.

segment_filter

string

Comma-separated segment types to keep in the response (alias: keep_segment_types). Omit to include every type.

merge_tables

boolean

Return the cached cross-page merged result for jobs that were submitted with merge_tables. No merging happens at read time; the job’s own setting takes precedence, so passing this for a job created without merge_tables has no effect. Defaults to false.

Response

job_id

string

Job identifier.

status

string

Current job status: Starting, Queued, Processing, Succeeded, Failed, or Cancelled. Jobs created through the v2 presigned-upload flow can also report AwaitingUpload before the file is uploaded.

created_at

string

ISO 8601 timestamp when the job was created.

metadata

object

Citation or job metadata. Populated when xml_citation is enabled or from the job record.

started_at

string

ISO 8601 timestamp when processing started. null until a worker picks the job up (Starting and Queued).

finished_at

string

ISO 8601 timestamp when processing completed. Present when status is Succeeded or Failed.

total_chunks

integer

Total number of document chunks. 0 until status is Succeeded.

page_count

integer

Number of pages in the document. 0 until processing begins; populated while status is Processing.

chunks

array

Array of document chunks with segments and extracted content. Empty until status is Succeeded (chunk content is only downloaded for succeeded jobs).

pdf_url

string

Presigned S3 URL to the generated PDF. Returned as null unless include_url=true is set.

exports

object

Presigned download URLs for exported file formats. Only present when export_format was specified in the parse request and the export has completed. Keys are format names (e.g. "docx"), values are presigned S3 URLs valid for 1 hour. If export failed, contains {"docx_error": "..."} instead. Always returned with URLs, regardless of include_url.

file_name

string

Original file name from the job record.

file_type

string

MIME type of the uploaded file.

file_url

string

S3 URL of the original uploaded file. Returned as null unless include_url=true is set.

credit_used

integer

Credits used for this job.

message

string

Status detail message, always present (e.g. “Task queued”). Carries the failure reason when status is Failed.

configuration

object

Configuration used for this job, mirroring the parameters submitted at creation time.

merge_tables

boolean

Whether table merging was enabled for this job.

Two flags change the response shape for succeeded jobs: with output_file=true the parsed content is replaced by a result object carrying output_file_url (a presigned URL to the raw output JSON) instead of inline chunks; and a merge_tables job whose merged result is already available may return the content nested under an output key with a task_id instead of the standard top-level fields.

curl -X 'GET' \
  'https://prod.visionapi.unsiloed.ai/parse/04a7a6d8-5ef7-465a-b22a-8a98e7104dd9' \
  -H 'accept: application/json' \
  -H 'api-key: your-api-key'

{
  "job_id": "04a7a6d8-5ef7-465a-b22a-8a98e7104dd9",
  "status": "Starting",
  "created_at": "2025-10-22T06:51:16.870302Z",
  "metadata": {}
}

Job Status Values

Starting

Job has been created and is waiting to be processed. This is the initial status when a parsing job is first created.

AwaitingUpload

Job was created through the v2 presigned-upload flow and is waiting for the file to be uploaded. It transitions to Queued once the upload completes.

Queued

Job is waiting to be picked up by a worker.

Processing

Job is currently being processed. This includes PDF parsing, text extraction, image analysis, table detection, and OCR processing.

Succeeded

Job has completed successfully. The response includes the complete analysis results with all extracted data, images, and metadata.

Failed

Job failed during processing. Check the message field for details about what went wrong.

Cancelled

Job was cancelled before processing completed.

Polling Strategy

For long-running parsing jobs, implement a polling strategy to check status periodically:

import requests
import time

def poll_parse_job(job_id, api_key, max_wait_time=300, poll_interval=5):
    """Poll a parsing job until completion or timeout"""
    
    start_time = time.time()
    headers = {"api-key": api_key}
    
    while time.time() - start_time < max_wait_time:
        response = requests.get(
            f"https://prod.visionapi.unsiloed.ai/parse/{job_id}",
            headers=headers
        )
        
        if response.status_code == 200:
            job = response.json()
            
            if job['status'] == 'Succeeded':
                return job
            elif job['status'] == 'Failed':
                raise Exception(f"Job failed: {job.get('message', 'Unknown error')}")
            elif job['status'] in ['Starting', 'Processing']:
                print(f"Job status: {job['status']} - waiting...")
                time.sleep(poll_interval)
            else:
                print(f"Unknown status: {job['status']}")
                time.sleep(poll_interval)
        else:
            print(f"Error checking status: {response.status_code}")
            time.sleep(poll_interval)
    
    raise Exception("Job polling timed out")

# Usage
try:
    result = poll_parse_job("04a7a6d8-5ef7-465a-b22a-8a98e7104dd9", "your-api-key")
    print("Job completed successfully!")
    print(f"Total chunks: {result['total_chunks']}")
except Exception as e:
    print(f"Error: {e}")

Segment Types

When a job succeeds, the response includes detailed analysis of different document segments:

Title

Top-level document titles, distinct from section headers.

SectionHeader

Document headers and titles that define section boundaries.

Text

Regular text content including paragraphs, sentences, and individual text elements.

ListItem

Individual items within ordered or unordered lists.

Table

Tabular data with structured rows and columns.

Picture

Images and graphics within the document, including logos, charts, and illustrations.

Caption

Text captions associated with images or figures.

Formula

Mathematical or chemical formulas detected within the document.

Footnote

Footnote text appearing at the bottom of a page.

PageHeader

Recurring header content appearing at the top of pages.

PageFooter

Recurring footer content appearing at the bottom of pages.

Error Handling

Common Error Scenarios

Job Not Found: Invalid or expired job ID returns a 404 response.
Unauthorized: Missing or invalid API key returns a 401 response.
Forbidden: Valid API key but no permission to access this task returns a 403 response.
Rate Limiting: This GET endpoint is not rate limited by the application; only the submit endpoints are. Poll responsibly regardless.
Client-Side Polling Timeout: The job did not complete within the time your polling logic allows. This is not a server-returned error; implement a reasonable client-side timeout and handle it gracefully.
Server Error: Internal processing error returns a 500 response.

Best Practices

Polling Frequency: Check status every 5-10 seconds for long-running jobs
Timeout Handling: Implement reasonable timeouts to prevent infinite polling
Error Recovery: Handle failed jobs gracefully with retry logic
API Key Security: Keep your API key secure and never expose it in client-side code

Rate Limits

Concurrent Jobs: Limited number of active parsing jobs per API key
Request Frequency: Avoid excessive polling (recommended: 5-10 second intervals)

Check your API plan for specific limits and quotas.

Authorizations

Authorization

string

header

required

API key for authentication. Use 'Bearer <your_api_key>'

Path Parameters

job_id

string

required

Job ID returned by POST /parse.

Query Parameters

include_chunks

boolean

Include the chunks array in the response. Defaults to true.

base64_urls

boolean

Return segment images as base64-encoded data URIs instead of S3 presigned URLs. Defaults to false.

output_file

boolean

Return a presigned S3 URL to the raw output JSON file instead of inlining the full response body. Also accepted as the output-file request header. Defaults to false.

enhanced_table

boolean

Apply enhanced table post-processing when assembling the response — improves cell-merge accuracy and structure recovery for complex tables, at the cost of extra latency. Also accepted as the enhanced-table request header. Defaults to false.

merge_tables

boolean

Apply the cross-page table-merge post-processing pass when assembling the response. Has no effect unless the job was parsed with merge_tables=true (the merge work runs at parse time). Defaults to false.

include_url

boolean

Include file URLs (pdf_url, file_url, segment images, exports, configuration.input_file_url) in the response. When false (default), every URL-bearing field is nulled so the response — and any log of it — does not leak the storage bucket/region/path. Also accepted as the include-url request header.

Response

Job status and results. Output fields (chunks, total_chunks, page_count, pdf_url) are present only when status is Succeeded.

Response body for GET /parse/{job_id}.

Fields marked as optional appear only when the job has reached the relevant status.

created_at

string

required

ISO 8601 timestamp when the job was created.

job_id

string

required

Job identifier.

metadata

object

required

Citation or job metadata. Populated when xml_citation is enabled or from the job record.

status

string

required

Current job status: Starting, Processing, Succeeded, Failed, or Cancelled.

chunks

object[] | null

Array of document chunks with segments and extracted content. Present when status is Succeeded.

Show child attributes

configuration

object

Configuration used for this job (mirrors the parameters submitted at creation time). The effective merge_tables value lives at configuration.merge_tables.

Show child attributes

credit_used

integer<int64> | null

Credits used for this job.

exports

object

Presigned download URLs for exported file formats. Only present when export_format was specified in the parse request and the export has completed. Keys are format names (e.g. "docx"), values are presigned S3 URLs valid for 1 hour. Example: {"docx": "https://s3.amazonaws.com/..."}. If export failed, contains {"docx_error": "..."} instead.

Show child attributes

file_name

string | null

Original file name from the job record.

file_type

string | null

MIME type of the uploaded file.

file_url

string | null

S3 URL of the original uploaded file.

finished_at

string | null

ISO 8601 timestamp when processing completed. Present when status is Succeeded or Failed.

message

string | null

Error or status detail message. Present when status is Failed.

page_count

integer<int64> | null

Number of pages in the document. Present when status is Succeeded.

pdf_url

string | null

Presigned S3 URL to the generated PDF. Present when status is Succeeded.

started_at

string | null

ISO 8601 timestamp when processing started. Present when status is not Starting.

total_chunks

integer<int64> | null

Total number of document chunks. Present when status is Succeeded.

​Overview

​Parameters

​Response

​Job Status Values

​Polling Strategy

​Segment Types

​Title

​SectionHeader

​Text

​ListItem

​Table

​Picture

​Caption

​Formula

​Footnote

​PageHeader

​PageFooter

​Page

​Error Handling

​Common Error Scenarios

​Best Practices

​Rate Limits

Authorizations

Path Parameters

Query Parameters

Response

Overview

Parameters

Response

Job Status Values

Polling Strategy

Segment Types

Title

SectionHeader

Text

ListItem

Table

Picture

Caption

Formula

Footnote

PageHeader

PageFooter

Page

Error Handling

Common Error Scenarios

Best Practices

Rate Limits