Mail rooms, AP scans, and patient intake forms often arrive as one big PDF with several different documents stacked together. The
/splitter endpoint takes that bundle and a list of categories, then returns one labeled PDF per matched category. For other endpoints, see the Parse quickstart, Extraction quickstart, or Classification quickstart./splitter, waits for the split to complete, and writes one labeled PDF per matched category into a local split_files/ directory, ready to drop into per-category downstream pipelines. If you’d rather just copy the whole script, it’s in the dropdown below.
Show the Full Script
Show the Full Script
Set
UNSILOED_API_KEY in your environment and save the bundled PDF as bundle.pdf in the same directory before running.- Python
- JavaScript
- cURL
split_bundle.py
Step 1: Set Up Your Environment
Before writing any code, we need three things: an API key, a bundled PDF, and the runtime for our chosen language.1.1 Get an Unsiloed AI API Key
To get API access, sign up on Unsiloed AI. Export your key as an environment variable namedUNSILOED_API_KEY so it stays out of source control:
1.2 Pick a Bundled PDF
The/splitter endpoint is designed for PDFs that contain more than one logical document. This walkthrough assumes a multi-document PDF saved as bundle.pdf in your working directory.
If you don’t have one handy, download our sample bundle (a three-page accounts-payable batch scan: an invoice, a receipt, and a purchase order) and save it as bundle.pdf.
1.3 Install Dependencies
- Python
- JavaScript
- cURL
You need Python 3.8 or newer. Install the
requests package:Step 2: Submit the Bundle
Two form fields go up:file for the bundled PDF and categories for a JSON-stringified array of the labels the splitter can choose from. The categories list is the only vocabulary the splitter uses. Pages that don’t fit any category are still grouped under the closest match, so the list needs to cover everything that could plausibly appear in the bundle. The endpoint returns a job_id to poll. All requests go to https://prod.visionapi.unsiloed.ai with the API key in the api-key header.
2.1 Set Up the Script
- Python
- JavaScript
- cURL
Create a file called
split_bundle.py and start with the imports and configuration:split_bundle.py
API_KEY reads your key from the environment so it doesn’t get hard-coded into the file, and BASE_URL points at the Unsiloed AI production endpoint. Both appear in every request below.2.2 Define the Categories
Decide which document types the bundle might contain. Each category is an object with aname and an optional description; richer descriptions help the splitter pick the right label when categories are similar.
- Python
- JavaScript
- cURL
Add the category list to the script:
split_bundle.py
2.3 Upload the Bundle
Send the file and categories as a multipart upload to/splitter. The endpoint expects the document under the form field name file and the categories as a JSON-encoded string under categories.
- Python
- JavaScript
- cURL
Continue the script by uploading the bundle:The
split_bundle.py
raise_for_status() call throws an HTTPError on any non-2xx response, so we don’t need to check .status_code ourselves.2.4 Capture the Job ID
- Python
- JavaScript
- cURL
Read and print the Run the script:The output should be a single line like
job_id:split_bundle.py
Job submitted: 887f26e6-d089-47f6-8def-afe84de40ecd.Step 3: Poll and Download the Split Files
The job runs asynchronously. We GET/splitter/{job_id} repeatedly until the status is completed, then download each split PDF using the signed URL in the response.
The status values the polling loop handles:
completed: the split files are ready to downloadfailed: the job errored; check theerrorfield for detailsqueued: the job is waiting to be picked upprocessing: the job is still running
3.1 Write the Polling Loop
- Python
- JavaScript
- cURL
Add a polling loop. The
max_attempts cap stops the loop if the job hangs:split_bundle.py
3.2 Download the Split PDFs
Each entry inresult.result.files has a presigned full_path URL that downloads the split PDF. The code below saves the metadata to result.json and writes each split file into a split_files/ directory.
- Python
- JavaScript
- cURL
Add the download step:Run the script:You should see a few
split_bundle.py
Status: processing lines, then Status: completed, then one Saved line per matched category.Error Responses
Failures fall into two buckets: HTTP errors raised before the job is queued, and afailed status on a job that started but couldn’t complete.
HTTP Errors
The/splitter endpoint returns JSON error bodies with a detail field. The common cases:
401 Unauthorized: body is{"detail":"Invalid API key"}. Theapi-keyheader is missing or wrong.400 Bad Request: body is{"detail":"Invalid JSON format for categories: ..."}. Thecategoriesform field isn’t valid JSON.422 Unprocessable Entity: body is{"detail":[{"type":"missing","loc":["body","categories"],"msg":"Field required","input":null}]}. A required form field (usuallyfileorcategories) is missing.400 Bad Request: body is{"detail":"At least one category is required"}. Thecategoriesarray is empty.400 Bad Request: body is{"detail":"Failed to process file: Failed to get PDF page count: ..."}. The upload isn’t a readable PDF.404 Not Found: body is{"detail":"Job not found"}. Thejob_idyou polled doesn’t exist.
Failed Jobs
A job that was accepted but couldn’t be processed comes back withstatus: "failed" and a populated error field. The walkthrough’s polling loop raises on this case so you see the message instead of waiting out the timeout:
Response Shape
A completed response contains job metadata, an echo of the inputparameters, and a result.files[] array with one entry per matched category. Each entry carries a presigned full_path URL we can download directly.
result.files[].full_path: presigned S3 URL to download the split PDF; this is what the walkthrough fetches intosplit_files/result.files[].name: filename derived from the matched category, suitable for saving to diskresult.files[].confidence_score: the splitter’s confidence in the classification, on a 0-1 scale; use it to flag low-confidence splits for human reviewresult.files[].fileId: unique identifier for the split file, useful for tracking or deduplicating downstream
parameters.classes: the category names you submittedparameters.category_descriptions: the descriptions you submitted, keyed by category nameparameters.page_count: number of pages in the uploaded PDFparameters.enable_reordering: whether the splitter reordered pages within each category after classification; defaults tofalsefile_url: signed URL to the original uploaded bundlefile_name: name of the uploaded bundle
status:completed,failed, or an in-progress value such asprocessingprogress: human-readable progress messageerror: error message if the job failed, otherwisenullresult.success: whether the split operation succeededresult.message: human-readable success or failure message
Sample Output
Running the script against the sample AP batch produces asplit_files/ directory with one PDF per matched category:
Next Steps
For more on splitting, including the underlying classification step and the full response shape, see the Splitting overview and the Response Format reference.
Splitting Overview
Learn how the splitter groups pages and where to use it in a pipeline.
Classification
Classify a single document against candidate categories instead of splitting a bundle.
API Reference
Browse the full request and response specs for the splitting endpoint.
FAQ
Check limits, supported formats, and answers to common questions.

