Getting Started With Classification

Classification picks the best-fit label for a document from a list of candidate categories we supply. The endpoint returns the matched category and a confidence score, ready to feed into routing logic. For raw Markdown or structured field extraction instead, see the Parse quickstart or the Extraction quickstart.

The walkthrough below builds a script that submits a PDF and our candidate categories to /classify, waits for the verdict, and saves the matched category and confidence score to disk. The accordion below has the full script if you’d rather copy and run it directly.

Show the Full Script

Set UNSILOED_API_KEY in your environment and save the document you want to classify as document.pdf in the same directory before running.

Python
JavaScript
cURL

classify_document.py

import json
import os
import time
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

categories = [
    {"name": "Sales Report"},
    {"name": "Invoice"},
    {"name": "Medical Record"},
]

with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/classify",
        headers={"api-key": API_KEY},
        files={"pdf_file": ("document.pdf", f, "application/pdf")},
        data={"categories": json.dumps(categories)},
    )
response.raise_for_status()

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

max_attempts = 60  # roughly 5 minutes at 5 seconds per poll
attempts = 0
while True:
    result = requests.get(
        f"{BASE_URL}/classify/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "classify job failed"))
    attempts += 1
    if attempts >= max_attempts:
        raise TimeoutError("Classify job did not finish within 5 minutes")
    time.sleep(5)

with open("classification.json", "w") as f:
    json.dump(result, f, indent=2)

classification = result["result"]
print(f"Classification: {classification['classification']} ({classification['confidence']:.2%} confidence)")

Save this as script.mjs or set "type": "module" in your package.json. Requires Node.js 18 or newer for the global fetch, FormData, and Blob.

script.mjs

import fs from "node:fs";

const API_KEY = process.env.UNSILOED_API_KEY;
const BASE_URL = "https://prod.visionapi.unsiloed.ai";

const categories = [
  { name: "Sales Report", description: "Sales performance summaries with regional or quarterly data" },
  { name: "Invoice", description: "Bill of sale with line items" },
  { name: "Medical Record", description: "Patient health records" },
];

const form = new FormData();
form.append("pdf_file", new Blob([fs.readFileSync("document.pdf")]), "document.pdf");
form.append("categories", JSON.stringify(categories));

const response = await fetch(`${BASE_URL}/classify`, {
  method: "POST",
  headers: { "api-key": API_KEY },
  body: form,
});
if (!response.ok) throw new Error(`${response.status}: ${await response.text()}`);

const { job_id } = await response.json();
console.log(`Job submitted: ${job_id}`);

const maxAttempts = 60; // roughly 5 minutes at 5 seconds per poll
let attempts = 0;
let result;
while (true) {
  const res = await fetch(`${BASE_URL}/classify/${job_id}`, {
    headers: { "api-key": API_KEY },
  });
  result = await res.json();
  console.log(`Status: ${result.status}`);
  if (result.status === "completed") break;
  if (result.status === "failed") throw new Error(result.error || "classify job failed");
  if (++attempts >= maxAttempts) throw new Error("Classify job did not finish within 5 minutes");
  await new Promise((r) => setTimeout(r, 5000));
}

fs.writeFileSync("classification.json", JSON.stringify(result, null, 2));
const { classification, confidence } = result.result;
console.log(`Classification: ${classification} (${(confidence * 100).toFixed(2)}% confidence)`);

# Submit the document and capture the job_id from the response:
resp=$(curl -sX POST "https://prod.visionapi.unsiloed.ai/classify" \
  -H "api-key: $UNSILOED_API_KEY" \
  -F "pdf_file=@document.pdf" \
  -F 'categories=[{"name":"Sales Report"},{"name":"Invoice"},{"name":"Medical Record"}]')
JOB_ID=$(echo "$resp" | grep -o '"job_id":"[^"]*"' | cut -d'"' -f4)
echo "Job submitted: $JOB_ID"

# Poll until the job finishes, with a 5-minute timeout:
attempts=0
max_attempts=60
while true; do
  resp=$(curl -sX GET "https://prod.visionapi.unsiloed.ai/classify/$JOB_ID" \
    -H "api-key: $UNSILOED_API_KEY")
  status=$(echo "$resp" | grep -o '"status":"[^"]*"' | head -1 | cut -d'"' -f4)
  echo "Status: $status"
  [ "$status" = "completed" ] && break
  [ "$status" = "failed" ] && { echo "Job failed"; exit 1; }
  attempts=$((attempts + 1))
  [ "$attempts" -ge "$max_attempts" ] && { echo "Classify job did not finish within 5 minutes"; exit 1; }
  sleep 5
done

# Save the full response to disk:
echo "$resp" > classification.json

Step 1: Set Up Your Environment

Before writing any code, gather three things: an API key, a document, and the runtime for the chosen language.

1.1 Get an Unsiloed AI API Key

To get API access, sign up on Unsiloed AI. Export your key as an environment variable named UNSILOED_API_KEY so it stays out of source control:

export UNSILOED_API_KEY="your-api-key"

1.2 Pick a Document to Classify

The /classify endpoint supports PDF, DOCX, PPTX, JPG, PNG, and other formats. The walkthrough below assumes a PDF saved as document.pdf in your working directory. To use a different format, update the filename and content type in the snippets to match your file. If you don’t have a document handy, download our sample PDF (a one-page lab report from Riverside Diagnostic Laboratory) and save it as document.pdf. The walkthrough scores it against three candidate categories so we can see a clear winner.

1.3 Install Dependencies

Python
JavaScript
cURL

You need Python 3.8 or newer. Install the requests package:

pip install requests

You need Node.js 18 or newer for the global fetch, FormData, and Blob. No external packages needed.

Step 2: Submit a Document With Categories

The request bundles two fields: pdf_file for the document and categories for a JSON-stringified array of category objects, each with a name and an optional description. The categories list is the model’s entire vocabulary for this call, so clear and distinct names matter more than they might appear. The endpoint returns a job_id to poll. All requests go to https://prod.visionapi.unsiloed.ai with the API key in the api-key header.

2.1 Set Up the Script

Python
JavaScript
cURL

Create a file called classify_document.py and start with the imports, configuration, and category list:

classify_document.py

import json
import os
import time
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

categories = [
    {"name": "Sales Report"},
    {"name": "Invoice"},
    {"name": "Medical Record"},
]

API_KEY reads your key from the environment so it doesn’t get hard-coded into the file, and BASE_URL points at the Unsiloed AI production endpoint. The categories list defines the candidate labels the model picks from. Only the names guide the result; a description key is accepted but not used by classification.

Create a file called script.mjs and start with the imports, configuration, and category list:

script.mjs

import fs from "node:fs";

const API_KEY = process.env.UNSILOED_API_KEY;
const BASE_URL = "https://prod.visionapi.unsiloed.ai";

const categories = [
  { name: "Sales Report", description: "Sales performance summaries with regional or quarterly data" },
  { name: "Invoice", description: "Bill of sale with line items" },
  { name: "Medical Record", description: "Patient health records" },
];

2.2 Upload the Document

Send the file and the JSON-encoded category list as a multipart upload to /classify. The document goes under pdf_file and the categories under categories.

Python
JavaScript
cURL

Continue the file by uploading the document:

classify_document.py

with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/classify",
        headers={"api-key": API_KEY},
        files={"pdf_file": ("document.pdf", f, "application/pdf")},
        data={"categories": json.dumps(categories)},
    )
response.raise_for_status()

raise_for_status() throws an HTTPError on any non-2xx response, so there’s no need to check .status_code separately.

Continue the file by uploading the document:

script.mjs

const form = new FormData();
form.append("pdf_file", new Blob([fs.readFileSync("document.pdf")]), "document.pdf");
form.append("categories", JSON.stringify(categories));

const response = await fetch(`${BASE_URL}/classify`, {
  method: "POST",
  headers: { "api-key": API_KEY },
  body: form,
});
if (!response.ok) throw new Error(`${response.status}: ${await response.text()}`);

fetch doesn’t throw on non-2xx responses by default, so we check response.ok and throw the error explicitly.

Run:

curl -X POST "https://prod.visionapi.unsiloed.ai/classify" \
  -H "api-key: $UNSILOED_API_KEY" \
  -F "pdf_file=@document.pdf" \
  -F 'categories=[{"name":"Sales Report"},{"name":"Invoice"},{"name":"Medical Record"}]'

The response prints to stdout. We need the job_id field for the next step.

2.3 Capture the Job ID

Python
JavaScript
cURL

Next, read and print the job_id:

classify_document.py

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

Run the script:

python classify_document.py

The output should be a single line like Job submitted: 2c231adf-ad5e-4e2e-8c0c-10cd7025c09b.

Next, read and log the job_id:

script.mjs

const { job_id } = await response.json();
console.log(`Job submitted: ${job_id}`);

Run the script:

node script.mjs

The output should be a single line like Job submitted: 2c231adf-ad5e-4e2e-8c0c-10cd7025c09b.

The response body from the POST above looks like:

{
  "job_id": "2c231adf-ad5e-4e2e-8c0c-10cd7025c09b",
  "status": "processing",
  "message": "Classification started",
  "quota_remaining": 7704
}

Copy the job_id value to paste into the polling command in the next step.

Step 3: Poll for Results

The job runs asynchronously. GET /classify/{job_id} repeatedly until the status is completed, then save the classification to disk. A status of completed means the result is ready. A status of failed means the job errored. Any other value (such as processing) means the job is still running.

3.1 Write the Polling Loop

Python
JavaScript
cURL

Drop in a polling loop. The max_attempts cap stops the loop if the job hangs:

classify_document.py

max_attempts = 60  # roughly 5 minutes at 5 seconds per poll
attempts = 0
while True:
    result = requests.get(
        f"{BASE_URL}/classify/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "classify job failed"))
    attempts += 1
    if attempts >= max_attempts:
        raise TimeoutError("Classify job did not finish within 5 minutes")
    time.sleep(5)

Drop in a polling loop. The maxAttempts cap stops the loop if the job hangs:

script.mjs

const maxAttempts = 60; // roughly 5 minutes at 5 seconds per poll
let attempts = 0;
let result;
while (true) {
  const res = await fetch(`${BASE_URL}/classify/${job_id}`, {
    headers: { "api-key": API_KEY },
  });
  result = await res.json();
  console.log(`Status: ${result.status}`);
  if (result.status === "completed") break;
  if (result.status === "failed") throw new Error(result.error || "classify job failed");
  if (++attempts >= maxAttempts) throw new Error("Classify job did not finish within 5 minutes");
  await new Promise((r) => setTimeout(r, 5000));
}

Replace JOB_ID below with the value you captured from Step 2.3, then run this loop. It polls every 5 seconds and gives up after 5 minutes if the job hasn’t completed:

JOB_ID="paste-job-id-here"
attempts=0
max_attempts=60  # roughly 5 minutes at 5 seconds per poll

while true; do
  resp=$(curl -sX GET "https://prod.visionapi.unsiloed.ai/classify/$JOB_ID" \
    -H "api-key: $UNSILOED_API_KEY")
  status=$(echo "$resp" | grep -o '"status":"[^"]*"' | head -1 | cut -d'"' -f4)
  echo "Status: $status"
  [ "$status" = "completed" ] && break
  [ "$status" = "failed" ] && { echo "Job failed"; exit 1; }
  attempts=$((attempts + 1))
  [ "$attempts" -ge "$max_attempts" ] && { echo "Classify job did not finish within 5 minutes"; exit 1; }
  sleep 5
done

The loop keeps the latest response body in $resp for the next step.

3.2 Save the Classification

Persist the result to disk so downstream code can read it. The full response, including the per-page breakdown, goes to classification.json.

Python
JavaScript
cURL

Finally, write the result to disk and print a summary:

classify_document.py

with open("classification.json", "w") as f:
    json.dump(result, f, indent=2)

classification = result["result"]
print(f"Classification: {classification['classification']} ({classification['confidence']:.2%} confidence)")

Run the script:

python classify_document.py

You should see one or two Status: processing lines, then Status: completed, then a summary line like Classification: Medical Record (100.00% confidence). The classification.json file appears in the working directory.

Finally, write the result to disk and log a summary:

script.mjs

fs.writeFileSync("classification.json", JSON.stringify(result, null, 2));
const { classification, confidence } = result.result;
console.log(`Classification: ${classification} (${(confidence * 100).toFixed(2)}% confidence)`);

Run the script:

node script.mjs

The polling loop in Step 3.1 left the full response in $resp. Write it to disk:

echo "$resp" > classification.json

The classification.json file now holds the full response. The overall label lives under result.classification and the per-page breakdown under result.page_results.

Error Responses

Failures fall into two buckets: HTTP errors raised before the job is queued, and a failed status on a job that started but could not complete.

HTTP Errors

The /classify endpoint returns JSON error bodies under a detail field. The common cases are:

401 Unauthorized: {"detail":"Invalid API key"}. The api-key header is missing or wrong.
400 Bad Request: {"detail":"Either pdf_file or file_url must be provided"} or {"detail":"At least one category is required"}. The submit form is missing a required field.
422 Unprocessable Entity: {"detail":[{"type":"missing","loc":["body","categories"],"msg":"Field required","input":null}]}. A required form field, usually categories, is missing entirely.
404 Not Found: {"detail":"Job not found"}. The job_id you polled doesn’t exist.

Failed Jobs

A job that was accepted but could not be processed returns status: "failed" on the polling endpoint. The response shape matches a successful one, but result is absent and the error field describes what went wrong:

{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "status": "failed",
  "progress": "Classification failed",
  "error": "Invalid PDF format"
}

Response Shape

A completed job returns job metadata plus a nested result object that contains the overall classification, a confidence score, and per-page results.

{
  "job_id": "2c231adf-ad5e-4e2e-8c0c-10cd7025c09b",
  "status": "completed",
  "progress": "Classification completed",
  "error": null,
  "result": {
    "success": true,
    "classification": "Medical Record",
    "confidence": 1.0,
    "total_pages": 1,
    "processed_pages": 1,
    "page_results": [
      {
        "page": 1,
        "success": true,
        "classification": "Medical Record",
        "raw_result": "Medical Record",
        "confidence": 1.0
      }
    ]
  }
}

The fields you use depend on what you’re building. They fall into three broad categories: For routing decisions:

result.classification: the overall predicted category for the document, drawn from the name values you submitted. This is the field the walkthrough prints.
result.confidence: confidence score for the overall classification, on a 0-1 scale. Treat it as a soft signal: high values rarely need review, low values flag documents worth a human look.

For per-page handling and mixed-content documents:

result.page_results[]: the per-page classifications the overall result is built from
page_results[].page: 1-indexed page number
page_results[].classification: the predicted category for that page
page_results[].raw_result: the model’s raw output before normalization to a category name; usually identical to classification
page_results[].confidence: the page-level confidence score on a 0-1 scale

For job tracking:

status: completed, failed, or an in-progress value such as processing
progress: human-readable progress message
error: error message if the job failed, otherwise null
result.total_pages and result.processed_pages: how much of the document the classifier got through

Sample Output

Running the script against the sample lab report and the three categories above writes the verdict to classification.json:

{
  "job_id": "2c231adf-ad5e-4e2e-8c0c-10cd7025c09b",
  "status": "completed",
  "progress": "Classification completed",
  "error": null,
  "result": {
    "success": true,
    "classification": "Medical Record",
    "confidence": 1.0,
    "total_pages": 1,
    "processed_pages": 1,
    "page_results": [
      {
        "page": 1,
        "success": true,
        "classification": "Medical Record",
        "raw_result": "Medical Record",
        "confidence": 1.0
      }
    ]
  }
}

The Riverside Diagnostic lab report lands cleanly in the Medical Record bucket with full confidence. Swap in your own document and category list to see how the classifier handles ambiguous cases.

Next Steps

For more on classification, including category design tips and the canonical response shape, see the Classification overview.

Classification Overview

Understand how the classifier scores pages and when to reach for it.

Response Format

Browse the full classification response with examples for each job state.

API Reference

Browse the full request and response specs for the classify endpoint.

Splitting

Split a mixed bundle into separate documents by section.

​Step 1: Set Up Your Environment

​1.1 Get an Unsiloed AI API Key

​1.2 Pick a Document to Classify

​1.3 Install Dependencies

​Step 2: Submit a Document With Categories

​2.1 Set Up the Script

​2.2 Upload the Document

​2.3 Capture the Job ID

​Step 3: Poll for Results

​3.1 Write the Polling Loop

​3.2 Save the Classification

​Error Responses

​HTTP Errors

​Failed Jobs

​Response Shape

​Sample Output

​Next Steps

Classification Overview

Response Format

API Reference

Splitting

Step 1: Set Up Your Environment

1.1 Get an Unsiloed AI API Key

1.2 Pick a Document to Classify

1.3 Install Dependencies

Step 2: Submit a Document With Categories

2.1 Set Up the Script

2.2 Upload the Document

2.3 Capture the Job ID

Step 3: Poll for Results

3.1 Write the Polling Loop

3.2 Save the Classification

Error Responses

HTTP Errors

Failed Jobs

Response Shape

Sample Output

Next Steps