Skip to main content
POST
/
splitter
curl -X POST "https://prod.visionapi.unsiloed.ai/splitter" \
  -H "api-key: your-api-key" \
  -F "file=@mixed_documents.pdf" \
  -F 'categories=[{"name":"invoice","description":"Business invoices with itemized charges"},{"name":"contract","description":"Legal agreements and binding documents"}]'
{
  "success": true,
  "message": "Successfully split PDF into 2 files",
  "files": [
    {
      "name": "invoice.pdf",
      "fileId": "d079d09f-201c-4420-a50a-b25678a71ae9",
      "type": "file",
      "path": "invoice.pdf",
      "full_path": "https://example-bucket.s3.amazonaws.com/files/ef3ec356-b407-4f9f-ac8f-0dfdef9034c0_invoice.pdf?AWSAccessKeyId=...&Signature=...&Expires=...",
      "confidence_score": 0.8
    },
    {
      "name": "contract.pdf",
      "fileId": "320616cc-8dfd-4b8a-8474-8e7a42d9e287",
      "type": "file",
      "path": "contract.pdf",
      "full_path": "https://example-bucket.s3.amazonaws.com/files/dfaa5d30-6955-4a69-9c69-7e3c4efd8450_contract.pdf?AWSAccessKeyId=...&Signature=...&Expires=...",
      "confidence_score": 0.8
    }
  ]
}

Overview

The Split Document endpoint analyzes PDF pages, classifies them into predefined categories, and creates separate PDF files for each category. This is ideal for processing mixed document batches like scanned files containing invoices, contracts, and reports.
The endpoint processes documents asynchronously via a job-based system. It returns a job_id immediately and processes the document in the background. Poll the status endpoint to retrieve results when complete.

Request

file
file
The PDF file to split. Either file or file_url must be provided; sending both returns a 400.
file_url
string
URL to a PDF file to split. Either file or file_url must be provided.
categories
string
required
JSON string containing array of category objects with name and optional description (e.g., [{"name":"invoice","description":"Financial invoices"}]). Descriptions help the classifier disambiguate similar categories. Categories that match no pages are skipped; no file is created for them.
enable_reordering
boolean
default:"false"
Reorder pages within each category after classification, using content and page numbers to infer the logical order. Only applied to categories that match more than one page.

Response

The endpoint returns HTTP 200 with the job identifier:
job_id
string
Unique identifier for the splitting job
status
string
Current status of the job (“processing”)
quota_remaining
number
Remaining API quota after this request

Split Result

When the job completes, GET /splitter/{job_id} returns the split files inside its result object. The fields below describe that result object:
success
boolean
Whether the splitting operation succeeded
message
string
Descriptive message about the splitting operation
files
array
Array of split PDF files with their metadata

Request Examples

curl -X POST "https://prod.visionapi.unsiloed.ai/splitter" \
  -H "api-key: your-api-key" \
  -F "file=@mixed_documents.pdf" \
  -F 'categories=[{"name":"invoice","description":"Business invoices with itemized charges"},{"name":"contract","description":"Legal agreements and binding documents"}]'

Response Examples

{
  "success": true,
  "message": "Successfully split PDF into 2 files",
  "files": [
    {
      "name": "invoice.pdf",
      "fileId": "d079d09f-201c-4420-a50a-b25678a71ae9",
      "type": "file",
      "path": "invoice.pdf",
      "full_path": "https://example-bucket.s3.amazonaws.com/files/ef3ec356-b407-4f9f-ac8f-0dfdef9034c0_invoice.pdf?AWSAccessKeyId=...&Signature=...&Expires=...",
      "confidence_score": 0.8
    },
    {
      "name": "contract.pdf",
      "fileId": "320616cc-8dfd-4b8a-8474-8e7a42d9e287",
      "type": "file",
      "path": "contract.pdf",
      "full_path": "https://example-bucket.s3.amazonaws.com/files/dfaa5d30-6955-4a69-9c69-7e3c4efd8450_contract.pdf?AWSAccessKeyId=...&Signature=...&Expires=...",
      "confidence_score": 0.8
    }
  ]
}

Authorizations

api-key
string
header
required

Body

multipart/form-data
categories
string
required

JSON string containing array of category objects with name and optional description. Example: [{"name":"invoice","description":"Business invoices with itemized charges"},{"name":"contract","description":"Legal agreements and binding documents"},{"name":"report"}]

file
file

PDF file to split. Either file or file_url must be provided.

file_url
string

URL to a PDF file to split. Either file or file_url must be provided. Example: https://example.com/mixed_documents.pdf

enable_reordering
boolean | null
default:false

Reorder pages within each category after classification, using content and page numbers to infer the logical order. Only applied to categories that match more than one page.

Response

200 - application/json

Split job created

job_id
string

Unique identifier for the splitting job

status
string

Current job status (typically 'processing')

quota_remaining
number

Remaining API quota after this request