Skip to main content
Classification picks the best-fit label for a document from a list of candidate categories we supply. The endpoint returns the matched category and a confidence score, ready to feed into routing logic. For raw Markdown or structured field extraction instead, see the Parse quickstart or the Extraction quickstart.
The walkthrough below builds a script that submits a PDF and our candidate categories to /classify, waits for the verdict, and saves the matched category and confidence score to disk. The accordion below has the full script if you’d rather copy and run it directly.
Set UNSILOED_API_KEY in your environment and save the document you want to classify as document.pdf in the same directory before running.
classify_document.py
import json
import os
import time
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

categories = [
    {"name": "Sales Report"},
    {"name": "Invoice"},
    {"name": "Medical Record"},
]

with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/classify",
        headers={"api-key": API_KEY},
        files={"pdf_file": ("document.pdf", f, "application/pdf")},
        data={"categories": json.dumps(categories)},
    )
response.raise_for_status()

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

max_attempts = 60  # roughly 5 minutes at 5 seconds per poll
attempts = 0
while True:
    result = requests.get(
        f"{BASE_URL}/classify/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "classify job failed"))
    attempts += 1
    if attempts >= max_attempts:
        raise TimeoutError("Classify job did not finish within 5 minutes")
    time.sleep(5)

with open("classification.json", "w") as f:
    json.dump(result, f, indent=2)

classification = result["result"]
print(f"Classification: {classification['classification']} ({classification['confidence']:.2%} confidence)")

Step 1: Set Up Your Environment

Before writing any code, gather three things: an API key, a document, and the runtime for the chosen language.

1.1 Get an Unsiloed AI API Key

To get API access, sign up on Unsiloed AI. Export your key as an environment variable named UNSILOED_API_KEY so it stays out of source control:
export UNSILOED_API_KEY="your-api-key"

1.2 Pick a Document to Classify

The /classify endpoint supports PDF, DOCX, PPTX, JPG, PNG, and other formats. The walkthrough below assumes a PDF saved as document.pdf in your working directory. To use a different format, update the filename and content type in the snippets to match your file. If you don’t have a document handy, download our sample PDF (a one-page lab report from Riverside Diagnostic Laboratory) and save it as document.pdf. The walkthrough scores it against three candidate categories so we can see a clear winner.

1.3 Install Dependencies

You need Python 3.8 or newer. Install the requests package:
pip install requests

Step 2: Submit a Document With Categories

The request bundles two fields: pdf_file for the document and categories for a JSON-stringified array of category objects, each with a name and an optional description. The categories list is the model’s entire vocabulary for this call, so clear and distinct names matter more than they might appear. The endpoint returns a job_id to poll. All requests go to https://prod.visionapi.unsiloed.ai with the API key in the api-key header.

2.1 Set Up the Script

Create a file called classify_document.py and start with the imports, configuration, and category list:
classify_document.py
import json
import os
import time
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

categories = [
    {"name": "Sales Report"},
    {"name": "Invoice"},
    {"name": "Medical Record"},
]
API_KEY reads your key from the environment so it doesn’t get hard-coded into the file, and BASE_URL points at the Unsiloed AI production endpoint. The categories list defines the candidate labels the model picks from. Only the names guide the result; a description key is accepted but not used by classification.

2.2 Upload the Document

Send the file and the JSON-encoded category list as a multipart upload to /classify. The document goes under pdf_file and the categories under categories.
Continue the file by uploading the document:
classify_document.py
with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/classify",
        headers={"api-key": API_KEY},
        files={"pdf_file": ("document.pdf", f, "application/pdf")},
        data={"categories": json.dumps(categories)},
    )
response.raise_for_status()
raise_for_status() throws an HTTPError on any non-2xx response, so there’s no need to check .status_code separately.

2.3 Capture the Job ID

Next, read and print the job_id:
classify_document.py
job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")
Run the script:
python classify_document.py
The output should be a single line like Job submitted: 2c231adf-ad5e-4e2e-8c0c-10cd7025c09b.

Step 3: Poll for Results

The job runs asynchronously. GET /classify/{job_id} repeatedly until the status is completed, then save the classification to disk. A status of completed means the result is ready. A status of failed means the job errored. Any other value (such as processing) means the job is still running.

3.1 Write the Polling Loop

Drop in a polling loop. The max_attempts cap stops the loop if the job hangs:
classify_document.py
max_attempts = 60  # roughly 5 minutes at 5 seconds per poll
attempts = 0
while True:
    result = requests.get(
        f"{BASE_URL}/classify/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "classify job failed"))
    attempts += 1
    if attempts >= max_attempts:
        raise TimeoutError("Classify job did not finish within 5 minutes")
    time.sleep(5)

3.2 Save the Classification

Persist the result to disk so downstream code can read it. The full response, including the per-page breakdown, goes to classification.json.
Finally, write the result to disk and print a summary:
classify_document.py
with open("classification.json", "w") as f:
    json.dump(result, f, indent=2)

classification = result["result"]
print(f"Classification: {classification['classification']} ({classification['confidence']:.2%} confidence)")
Run the script:
python classify_document.py
You should see one or two Status: processing lines, then Status: completed, then a summary line like Classification: Medical Record (100.00% confidence). The classification.json file appears in the working directory.

Error Responses

Failures fall into two buckets: HTTP errors raised before the job is queued, and a failed status on a job that started but could not complete.

HTTP Errors

The /classify endpoint returns JSON error bodies under a detail field. The common cases are:
  • 401 Unauthorized: {"detail":"Invalid API key"}. The api-key header is missing or wrong.
  • 400 Bad Request: {"detail":"Either pdf_file or file_url must be provided"} or {"detail":"At least one category is required"}. The submit form is missing a required field.
  • 422 Unprocessable Entity: {"detail":[{"type":"missing","loc":["body","categories"],"msg":"Field required","input":null}]}. A required form field, usually categories, is missing entirely.
  • 404 Not Found: {"detail":"Job not found"}. The job_id you polled doesn’t exist.

Failed Jobs

A job that was accepted but could not be processed returns status: "failed" on the polling endpoint. The response shape matches a successful one, but result is absent and the error field describes what went wrong:
{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "status": "failed",
  "progress": "Classification failed",
  "error": "Invalid PDF format"
}

Response Shape

A completed job returns job metadata plus a nested result object that contains the overall classification, a confidence score, and per-page results.
{
  "job_id": "2c231adf-ad5e-4e2e-8c0c-10cd7025c09b",
  "status": "completed",
  "progress": "Classification completed",
  "error": null,
  "result": {
    "success": true,
    "classification": "Medical Record",
    "confidence": 1.0,
    "total_pages": 1,
    "processed_pages": 1,
    "page_results": [
      {
        "page": 1,
        "success": true,
        "classification": "Medical Record",
        "raw_result": "Medical Record",
        "confidence": 1.0
      }
    ]
  }
}
The fields you use depend on what you’re building. They fall into three broad categories: For routing decisions:
  • result.classification: the overall predicted category for the document, drawn from the name values you submitted. This is the field the walkthrough prints.
  • result.confidence: confidence score for the overall classification, on a 0-1 scale. Treat it as a soft signal: high values rarely need review, low values flag documents worth a human look.
For per-page handling and mixed-content documents:
  • result.page_results[]: the per-page classifications the overall result is built from
  • page_results[].page: 1-indexed page number
  • page_results[].classification: the predicted category for that page
  • page_results[].raw_result: the model’s raw output before normalization to a category name; usually identical to classification
  • page_results[].confidence: the page-level confidence score on a 0-1 scale
For job tracking:
  • status: completed, failed, or an in-progress value such as processing
  • progress: human-readable progress message
  • error: error message if the job failed, otherwise null
  • result.total_pages and result.processed_pages: how much of the document the classifier got through

Sample Output

Running the script against the sample lab report and the three categories above writes the verdict to classification.json:
{
  "job_id": "2c231adf-ad5e-4e2e-8c0c-10cd7025c09b",
  "status": "completed",
  "progress": "Classification completed",
  "error": null,
  "result": {
    "success": true,
    "classification": "Medical Record",
    "confidence": 1.0,
    "total_pages": 1,
    "processed_pages": 1,
    "page_results": [
      {
        "page": 1,
        "success": true,
        "classification": "Medical Record",
        "raw_result": "Medical Record",
        "confidence": 1.0
      }
    ]
  }
}
The Riverside Diagnostic lab report lands cleanly in the Medical Record bucket with full confidence. Swap in your own document and category list to see how the classifier handles ambiguous cases.

Next Steps

For more on classification, including category design tips and the canonical response shape, see the Classification overview.

Classification Overview

Understand how the classifier scores pages and when to reach for it.

Response Format

Browse the full classification response with examples for each job state.

API Reference

Browse the full request and response specs for the classify endpoint.

Splitting

Split a mixed bundle into separate documents by section.