Getting Started With Extract

Extraction pulls typed fields out of a document against a JSON schema we define, returning each leaf tagged with its own confidence score. For raw Markdown or just a category label instead, see the Parse quickstart or the Classification quickstart.

By the end, we’ll have a script that gives an invoice PDF and a schema to /v2/extract, waits for the job to finish, and writes the matched fields back as a clean JSON object with per-field confidence scores. Grab the full script from the dropdown below if you’d rather skip the walkthrough.

Show the Full Script

Set UNSILOED_API_KEY in your environment and save the document you want to extract from as document.pdf in the same directory before running.

Python
JavaScript
cURL

extract_document.py

import json
import os
import time
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

schema = {
    "type": "object",
    "properties": {
        "vendor_name": {
            "type": "string",
            "description": "Name of the company issuing the invoice (the seller)",
        },
        "invoice_number": {
            "type": "string",
            "description": "Unique invoice identifier shown on the document",
        },
        "issue_date": {
            "type": "string",
            "description": "Date the invoice was issued",
        },
        "total_due": {
            "type": "number",
            "description": "Final total amount due in US dollars, including tax",
        },
        "line_items": {
            "type": "array",
            "description": "One row per line item in the invoice table",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string", "description": "Description of the product or service"},
                    "quantity":    {"type": "number", "description": "Quantity of the item ordered"},
                    "unit_price":  {"type": "number", "description": "Price per unit in US dollars"},
                    "subtotal":    {"type": "number", "description": "Line subtotal in US dollars (quantity x unit_price)"},
                },
                "required": ["description", "quantity", "unit_price", "subtotal"],
                "additionalProperties": False,
            },
        },
    },
    "required": ["vendor_name", "invoice_number", "issue_date", "total_due", "line_items"],
    "additionalProperties": False,
}

with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/v2/extract",
        headers={"api-key": API_KEY},
        files={"pdf_file": ("document.pdf", f, "application/pdf")},
        data={"schema_data": json.dumps(schema)},
    )
response.raise_for_status()

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

max_attempts = 60  # roughly 5 minutes at 5 seconds per poll
attempts = 0
while True:
    result = requests.get(
        f"{BASE_URL}/extract/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "extract job failed"))
    attempts += 1
    if attempts >= max_attempts:
        raise TimeoutError("Extract job did not finish within 5 minutes")
    time.sleep(5)

with open("result.json", "w") as f:
    json.dump(result, f, indent=2)

print(f"Saved extracted fields to result.json")

Save this as script.mjs or set "type": "module" in your package.json. Requires Node.js 18 or newer for the global fetch, FormData, and Blob.

script.mjs

import fs from "node:fs";

const API_KEY = process.env.UNSILOED_API_KEY;
const BASE_URL = "https://prod.visionapi.unsiloed.ai";

const schema = {
  type: "object",
  properties: {
    vendor_name:    { type: "string", description: "Name of the company issuing the invoice (the seller)" },
    invoice_number: { type: "string", description: "Unique invoice identifier shown on the document" },
    issue_date:     { type: "string", description: "Date the invoice was issued" },
    total_due:      { type: "number", description: "Final total amount due in US dollars, including tax" },
    line_items: {
      type: "array",
      description: "One row per line item in the invoice table",
      items: {
        type: "object",
        properties: {
          description: { type: "string", description: "Description of the product or service" },
          quantity:    { type: "number", description: "Quantity of the item ordered" },
          unit_price:  { type: "number", description: "Price per unit in US dollars" },
          subtotal:    { type: "number", description: "Line subtotal in US dollars (quantity x unit_price)" },
        },
        required: ["description", "quantity", "unit_price", "subtotal"],
        additionalProperties: false,
      },
    },
  },
  required: ["vendor_name", "invoice_number", "issue_date", "total_due", "line_items"],
  additionalProperties: false,
};

const form = new FormData();
form.append("pdf_file", new Blob([fs.readFileSync("document.pdf")]), "document.pdf");
form.append("schema_data", JSON.stringify(schema));

const response = await fetch(`${BASE_URL}/v2/extract`, {
  method: "POST",
  headers: { "api-key": API_KEY },
  body: form,
});
if (!response.ok) throw new Error(`${response.status}: ${await response.text()}`);

const { job_id } = await response.json();
console.log(`Job submitted: ${job_id}`);

const maxAttempts = 60; // roughly 5 minutes at 5 seconds per poll
let attempts = 0;
let result;
while (true) {
  const res = await fetch(`${BASE_URL}/extract/${job_id}`, {
    headers: { "api-key": API_KEY },
  });
  result = await res.json();
  console.log(`Status: ${result.status}`);
  if (result.status === "completed") break;
  if (result.status === "failed") throw new Error(result.error || "extract job failed");
  if (++attempts >= maxAttempts) throw new Error("Extract job did not finish within 5 minutes");
  await new Promise((r) => setTimeout(r, 5000));
}

fs.writeFileSync("result.json", JSON.stringify(result, null, 2));
console.log("Saved extracted fields to result.json");

# Write the schema to a file so we can pass it cleanly:
cat > schema.json <<'EOF'
{
  "type": "object",
  "properties": {
    "vendor_name":    { "type": "string", "description": "Name of the company issuing the invoice (the seller)" },
    "invoice_number": { "type": "string", "description": "Unique invoice identifier shown on the document" },
    "issue_date":     { "type": "string", "description": "Date the invoice was issued" },
    "total_due":      { "type": "number", "description": "Final total amount due in US dollars, including tax" },
    "line_items": {
      "type": "array",
      "description": "One row per line item in the invoice table",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string", "description": "Description of the product or service" },
          "quantity":    { "type": "number", "description": "Quantity of the item ordered" },
          "unit_price":  { "type": "number", "description": "Price per unit in US dollars" },
          "subtotal":    { "type": "number", "description": "Line subtotal in US dollars (quantity x unit_price)" }
        },
        "required": ["description", "quantity", "unit_price", "subtotal"],
        "additionalProperties": false
      }
    }
  },
  "required": ["vendor_name", "invoice_number", "issue_date", "total_due", "line_items"],
  "additionalProperties": false
}
EOF

# Submit the document with the schema and capture the job_id:
resp=$(curl -sX POST "https://prod.visionapi.unsiloed.ai/v2/extract" \
  -H "api-key: $UNSILOED_API_KEY" \
  -F "pdf_file=@document.pdf" \
  -F "schema_data=$(cat schema.json)")
JOB_ID=$(echo "$resp" | grep -o '"job_id":"[^"]*"' | cut -d'"' -f4)
echo "Job submitted: $JOB_ID"

# Poll until the job finishes, with a 5-minute timeout:
attempts=0
max_attempts=60
while true; do
  resp=$(curl -sX GET "https://prod.visionapi.unsiloed.ai/extract/$JOB_ID" \
    -H "api-key: $UNSILOED_API_KEY")
  status=$(echo "$resp" | grep -o '"status":"[^"]*"' | head -1 | cut -d'"' -f4)
  echo "Status: $status"
  [ "$status" = "completed" ] && break
  [ "$status" = "failed" ] && { echo "Job failed"; exit 1; }
  attempts=$((attempts + 1))
  [ "$attempts" -ge "$max_attempts" ] && { echo "Extract job did not finish within 5 minutes"; exit 1; }
  sleep 5
done

# Save the full response to disk:
echo "$resp" > result.json

Step 1: Set Up Your Environment

Before writing any code, we need three things: an API key, a document, and the runtime for our chosen language.

1.1 Get an Unsiloed AI API Key

To get API access, sign up on Unsiloed AI. Export your key as an environment variable named UNSILOED_API_KEY so it stays out of source control:

export UNSILOED_API_KEY="your-api-key"

1.2 Pick a Document to Extract Fields From

The /v2/extract endpoint supports PDF, DOCX, PPTX, JPG, PNG, and other formats. The walkthrough below assumes a PDF saved as document.pdf in your working directory. To use a different format, update the filename in the snippets to match your file. If you don’t have a document handy, download our sample invoice PDF (a one-page invoice from Northwind Office Supplies with five line items) and save it as document.pdf. The schema in this guide targets the vendor, invoice number, issue date, total, and the line item table on that invoice.

1.3 Install Dependencies

Python
JavaScript
cURL

You need Python 3.8 or newer. Install the requests package:

pip install requests

You need Node.js 18 or newer for the global fetch, FormData, and Blob. No external packages needed.

Step 2: Submit a Document With a Schema

The request bundles two fields: pdf_file for the document and schema_data for the JSON schema as a string. The schema is the interesting half. Anything we describe there, from a single total to a nested array of line items, comes back typed and scored in the exact shape we asked for. The endpoint returns a job_id we can poll. All requests go to https://prod.visionapi.unsiloed.ai with the API key in the api-key header.

2.1 Set Up the Script

Python
JavaScript
cURL

Create a file called extract_document.py and start with the imports and configuration:

extract_document.py

import json
import os
import time
import requests

API_KEY = os.environ["UNSILOED_API_KEY"]
BASE_URL = "https://prod.visionapi.unsiloed.ai"

API_KEY reads your key from the environment so it doesn’t get hard-coded into the file, and BASE_URL points at the Unsiloed AI production endpoint. Both appear in every request below.

Create a file called script.mjs and start with the imports and configuration:

script.mjs

import fs from "node:fs";

const API_KEY = process.env.UNSILOED_API_KEY;
const BASE_URL = "https://prod.visionapi.unsiloed.ai";

API_KEY reads your key from the environment so it doesn’t get hard-coded into the file, and BASE_URL points at the Unsiloed AI production endpoint. Both appear in every request below.

2.2 Define the Schema

The schema tells the API which fields to pull out and what shape they should take. The clearer the description on each field, the better the model locates and types each value. For our sample invoice, we want the vendor name, the invoice number, the issue date, the total due, and the five rows of the line item table.

Python
JavaScript
cURL

Continue the file by defining the schema as a Python dict:

extract_document.py

schema = {
    "type": "object",
    "properties": {
        "vendor_name": {
            "type": "string",
            "description": "Name of the company issuing the invoice (the seller)",
        },
        "invoice_number": {
            "type": "string",
            "description": "Unique invoice identifier shown on the document",
        },
        "issue_date": {
            "type": "string",
            "description": "Date the invoice was issued",
        },
        "total_due": {
            "type": "number",
            "description": "Final total amount due in US dollars, including tax",
        },
        "line_items": {
            "type": "array",
            "description": "One row per line item in the invoice table",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string", "description": "Description of the product or service"},
                    "quantity":    {"type": "number", "description": "Quantity of the item ordered"},
                    "unit_price":  {"type": "number", "description": "Price per unit in US dollars"},
                    "subtotal":    {"type": "number", "description": "Line subtotal in US dollars (quantity x unit_price)"},
                },
                "required": ["description", "quantity", "unit_price", "subtotal"],
                "additionalProperties": False,
            },
        },
    },
    "required": ["vendor_name", "invoice_number", "issue_date", "total_due", "line_items"],
    "additionalProperties": False,
}

The schema is plain JSON Schema with strict-mode rules: additionalProperties: false at every object level (which prevents the model from inventing fields we didn’t ask for), and a required list naming the must-have fields. See the Schemas reference for the full ruleset.

Continue the file by defining the schema as a JavaScript object:

script.mjs

const schema = {
  type: "object",
  properties: {
    vendor_name:    { type: "string", description: "Name of the company issuing the invoice (the seller)" },
    invoice_number: { type: "string", description: "Unique invoice identifier shown on the document" },
    issue_date:     { type: "string", description: "Date the invoice was issued" },
    total_due:      { type: "number", description: "Final total amount due in US dollars, including tax" },
    line_items: {
      type: "array",
      description: "One row per line item in the invoice table",
      items: {
        type: "object",
        properties: {
          description: { type: "string", description: "Description of the product or service" },
          quantity:    { type: "number", description: "Quantity of the item ordered" },
          unit_price:  { type: "number", description: "Price per unit in US dollars" },
          subtotal:    { type: "number", description: "Line subtotal in US dollars (quantity x unit_price)" },
        },
        required: ["description", "quantity", "unit_price", "subtotal"],
        additionalProperties: false,
      },
    },
  },
  required: ["vendor_name", "invoice_number", "issue_date", "total_due", "line_items"],
  additionalProperties: false,
};

Write the schema to a file to pass it to curl as the value of the schema_data form field:

cat > schema.json <<'EOF'
{
  "type": "object",
  "properties": {
    "vendor_name":    { "type": "string", "description": "Name of the company issuing the invoice (the seller)" },
    "invoice_number": { "type": "string", "description": "Unique invoice identifier shown on the document" },
    "issue_date":     { "type": "string", "description": "Date the invoice was issued" },
    "total_due":      { "type": "number", "description": "Final total amount due in US dollars, including tax" },
    "line_items": {
      "type": "array",
      "description": "One row per line item in the invoice table",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string", "description": "Description of the product or service" },
          "quantity":    { "type": "number", "description": "Quantity of the item ordered" },
          "unit_price":  { "type": "number", "description": "Price per unit in US dollars" },
          "subtotal":    { "type": "number", "description": "Line subtotal in US dollars (quantity x unit_price)" }
        },
        "required": ["description", "quantity", "unit_price", "subtotal"],
        "additionalProperties": false
      }
    }
  },
  "required": ["vendor_name", "invoice_number", "issue_date", "total_due", "line_items"],
  "additionalProperties": false
}
EOF

The schema follows JSON Schema with strict-mode rules: additionalProperties: false at every object level, and a required list naming the must-have fields. See the Schemas reference for the full ruleset.

2.3 Upload the Document

Send the file and the schema as a multipart upload to /v2/extract. The endpoint expects the document under the form field name pdf_file and the schema under schema_data (as a JSON string).

Python
JavaScript
cURL

Next, upload the document and the schema together:

extract_document.py

with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{BASE_URL}/v2/extract",
        headers={"api-key": API_KEY},
        files={"pdf_file": ("document.pdf", f, "application/pdf")},
        data={"schema_data": json.dumps(schema)},
    )
response.raise_for_status()

The raise_for_status() call throws an HTTPError on any non-2xx response, so we don’t need to check .status_code ourselves. The json.dumps(schema) call serializes the dict because the endpoint expects schema_data as a string, not a nested form field.

Next, upload the document and the schema together:

script.mjs

const form = new FormData();
form.append("pdf_file", new Blob([fs.readFileSync("document.pdf")]), "document.pdf");
form.append("schema_data", JSON.stringify(schema));

const response = await fetch(`${BASE_URL}/v2/extract`, {
  method: "POST",
  headers: { "api-key": API_KEY },
  body: form,
});
if (!response.ok) throw new Error(`${response.status}: ${await response.text()}`);

fetch doesn’t throw on non-2xx responses by default, so we check response.ok and raise the error ourselves. The JSON.stringify(schema) call serializes the schema because the endpoint expects schema_data as a string, not a nested form field.

Run:

curl -X POST "https://prod.visionapi.unsiloed.ai/v2/extract" \
  -H "api-key: $UNSILOED_API_KEY" \
  -F "pdf_file=@document.pdf" \
  -F "schema_data=$(cat schema.json)"

The response prints to stdout. We need the job_id field for the next step.

2.4 Capture the Job ID

Python
JavaScript
cURL

Then read and print the job_id:

extract_document.py

job_id = response.json()["job_id"]
print(f"Job submitted: {job_id}")

Run the script:

python extract_document.py

The output should be a single line like Job submitted: a90e48c4-f564-435e-9bf2-ab6eb5a0376d.

Then read and log the job_id:

script.mjs

const { job_id } = await response.json();
console.log(`Job submitted: ${job_id}`);

Run the script:

node script.mjs

The output should be a single line like Job submitted: a90e48c4-f564-435e-9bf2-ab6eb5a0376d.

The response body from the POST above looks like:

{
  "job_id": "a90e48c4-f564-435e-9bf2-ab6eb5a0376d",
  "status": "queued",
  "message": "PDF citation processing started",
  "quota_remaining": 7705
}

Copy the job_id value; you’ll paste it into the polling command in the next step.

Step 3: Poll for Results

The job runs asynchronously. We GET /extract/{job_id} repeatedly until the status is completed, then save the extracted fields to disk. A status of completed means the result is ready; failed means the job errored; any other value (queued, processing, and so on) means the job is still running.

3.1 Write the Polling Loop

Python
JavaScript
cURL

Next, drop in a polling loop. The max_attempts cap stops the loop if the job hangs:

extract_document.py

max_attempts = 60  # roughly 5 minutes at 5 seconds per poll
attempts = 0
while True:
    result = requests.get(
        f"{BASE_URL}/extract/{job_id}",
        headers={"api-key": API_KEY},
    ).json()
    print(f"Status: {result['status']}")
    if result["status"] == "completed":
        break
    if result["status"] == "failed":
        raise RuntimeError(result.get("error", "extract job failed"))
    attempts += 1
    if attempts >= max_attempts:
        raise TimeoutError("Extract job did not finish within 5 minutes")
    time.sleep(5)

Next, drop in a polling loop. The maxAttempts cap stops the loop if the job hangs:

script.mjs

const maxAttempts = 60; // roughly 5 minutes at 5 seconds per poll
let attempts = 0;
let result;
while (true) {
  const res = await fetch(`${BASE_URL}/extract/${job_id}`, {
    headers: { "api-key": API_KEY },
  });
  result = await res.json();
  console.log(`Status: ${result.status}`);
  if (result.status === "completed") break;
  if (result.status === "failed") throw new Error(result.error || "extract job failed");
  if (++attempts >= maxAttempts) throw new Error("Extract job did not finish within 5 minutes");
  await new Promise((r) => setTimeout(r, 5000));
}

Replace JOB_ID below with the value you captured from Step 2.4, then run this loop. It polls every 5 seconds and gives up after 5 minutes if the job hasn’t completed:

JOB_ID="paste-job-id-here"
attempts=0
max_attempts=60  # roughly 5 minutes at 5 seconds per poll

while true; do
  resp=$(curl -sX GET "https://prod.visionapi.unsiloed.ai/extract/$JOB_ID" \
    -H "api-key: $UNSILOED_API_KEY")
  status=$(echo "$resp" | grep -o '"status":"[^"]*"' | head -1 | cut -d'"' -f4)
  echo "Status: $status"
  [ "$status" = "completed" ] && break
  [ "$status" = "failed" ] && { echo "Job failed"; exit 1; }
  attempts=$((attempts + 1))
  [ "$attempts" -ge "$max_attempts" ] && { echo "Extract job did not finish within 5 minutes"; exit 1; }
  sleep 5
done

The loop keeps the latest response body in $resp for the next step.

3.2 Save the Extracted Fields

Finally, persist the result to disk. The response is already structured JSON, so we write it straight to result.json.

Python
JavaScript
cURL

Finally, write the result to disk:

extract_document.py

with open("result.json", "w") as f:
    json.dump(result, f, indent=2)

print(f"Saved extracted fields to result.json")

Run the script:

python extract_document.py

You should see a few Status: processing lines, then Status: completed, then the summary line. The result.json file appears in the working directory.

Finally, write the result to disk:

script.mjs

fs.writeFileSync("result.json", JSON.stringify(result, null, 2));
console.log("Saved extracted fields to result.json");

Run the script:

node script.mjs

You should see a few Status: processing lines, then Status: completed, then the summary line. The result.json file appears in the working directory.

The polling loop in Step 3.1 left the full response in $resp. Write it to disk:

echo "$resp" > result.json

The result.json file now holds the full response. Extracted fields sit under result.{field_name}, each with a value and a score.

Error Responses

Failures fall into two buckets: HTTP errors raised before the job is queued, and a failed status on a job that started but couldn’t complete.

HTTP Errors

The /v2/extract endpoint returns JSON bodies on HTTP errors, with a single detail field describing the problem. The common cases:

401 Unauthorized: body is {"detail": "Invalid API key"}. The api-key header is missing or wrong.
400 Bad Request (missing file): body is {"detail": "Either pdf_file or file_url must be provided"}. The pdf_file form field is missing.
400 Bad Request (bad JSON): body is {"detail": "schema_data must be valid JSON"}. The schema_data field isn’t parseable JSON.
400 Bad Request (unreadable PDF): body starts with {"detail": "Error extracting data with citations: Failed to get PDF page count..."}. The uploaded file isn’t a valid PDF.
422 Unprocessable Entity: body lists the missing or malformed form fields. Usually thrown when schema_data is absent.
404 Not Found: body is {"detail": "Job <id> not found"}. The job_id you polled doesn’t exist.

Failed Jobs

A job that was accepted but couldn’t be processed comes back with status: "failed" on a subsequent poll. The response shape mirrors a completed one, with an error field describing what went wrong:

{
  "job_id": "7b31a7d7-e810-4a0b-931e-fbed0879bab2",
  "status": "failed",
  "file_name": "document.pdf",
  "error": "Failed to extract structured data from document"
}

Response Shape

A completed response contains job metadata plus a result object with one entry per top-level field in your schema. Each entry has the extracted value and a score between 0 and 1. For arrays of objects, the array itself has a score, and every property inside each row carries its own score as well.

{
  "job_id": "cec5dcb5-53c6-47d5-afe7-28b2182171fb",
  "status": "completed",
  "file_name": "document.pdf",
  "file_url": "https://example-bucket.s3.amazonaws.com/...",
  "created_at": "2026-05-27T08:36:58.336535+00:00",
  "updated_at": "2026-05-27T08:37:19.436604+00:00",
  "metadata": {
    "page_count": 1,
    "order": ["vendor_name", "invoice_number", "issue_date", "total_due", "line_items"],
    "schema": { "...": "..." }
  },
  "result": {
    "vendor_name":    { "value": "Northwind Office Supplies", "score": 0.96 },
    "invoice_number": { "value": "INV-2026-00487",            "score": 0.97 },
    "issue_date":     { "value": "April 14, 2026",            "score": 0.98 },
    "total_due":      { "value": 3705.1,                      "score": 0.96 },
    "line_items": {
      "score": 0.98,
      "value": [
        {
          "description": { "value": "Ergonomic Mesh Office Chair", "score": 0.95, "citation": null },
          "quantity":    { "value": 4,                              "score": 0.96, "citation": null },
          "unit_price":  { "value": 289.0,                          "score": 0.97, "citation": null },
          "subtotal":    { "value": 1156.0,                         "score": 0.98, "citation": null }
        },
        "...four more rows..."
      ]
    }
  }
}

The fields you’ll actually use depend on what you’re building. They fall into three broad categories: For typed values and validation:

result.{field_name}.value: the extracted data, typed to match your schema (string, number, boolean, object, or array)
result.{field_name}.score: confidence score between 0 and 1, higher is better. Use it to flag uncertain values for human review.
result.{array_field}.value[].{property}.citation: reserved slot for source citations on array rows; null for now

For schema and ordering:

metadata.schema: an echo of the schema you submitted, useful for round-tripping or auditing
metadata.order: the original order of top-level fields in your schema, since JSON objects don’t preserve insertion order across all clients
metadata.page_count: number of pages in the uploaded document

For job and audit tracking:

job_id: unique identifier for the extraction job
status: completed, failed, or an in-progress value (queued, processing)
file_name: name of the uploaded file
file_url: temporary signed S3 URL to the uploaded file
created_at, updated_at: ISO 8601 timestamps for submission and the most recent status change

Sample Output

Running the script against the sample invoice writes the JSON above to result.json. Every field comes back with its own confidence score, so flagging uncertain values becomes a per-field check rather than a re-read of the document. The fields extracted from the sample:

Field	Extracted value	Confidence
`vendor_name`	Northwind Office Supplies	96%
`invoice_number`	INV-2026-00487	97%
`issue_date`	April 14, 2026	98%
`total_due`	3705.10	96%
`line_items`	5 rows	98%

And the five rows of line_items, each a structured object in its own right:

Description	Quantity	Unit Price	Subtotal
Ergonomic Mesh Office Chair	4	289.00	1,156.00
Adjustable Standing Desk (60” x 30”)	2	549.00	1,098.00
LED Desk Lamp with USB Charging	6	42.50	255.00
Acoustic Panel, 24” Hexagon (Pack of 4)	3	78.00	234.00
Wireless Mechanical Keyboard	5	129.99	649.95

Every cell in the table has its own score in the underlying JSON, so downstream code can flag individual uncertain values without rejecting the whole row.

Next Steps

For more on extraction, including schema rules, supported types, and the full response reference, see the Extract overview.

Schemas

JSON Schema rules, supported types, and worked examples for invoices and SEC filings.

Response Format

The canonical extraction response with a field-by-field reference.

API Reference

Browse the full request and response specs for /v2/extract.

FAQ

Check limits, supported formats, and answers to common questions.

​Step 1: Set Up Your Environment

​1.1 Get an Unsiloed AI API Key

​1.2 Pick a Document to Extract Fields From

​1.3 Install Dependencies

​Step 2: Submit a Document With a Schema

​2.1 Set Up the Script

​2.2 Define the Schema

​2.3 Upload the Document

​2.4 Capture the Job ID

​Step 3: Poll for Results

​3.1 Write the Polling Loop

​3.2 Save the Extracted Fields

​Error Responses

​HTTP Errors

​Failed Jobs

​Response Shape

​Sample Output

​Next Steps

Schemas

Response Format

API Reference

FAQ

Step 1: Set Up Your Environment

1.1 Get an Unsiloed AI API Key

1.2 Pick a Document to Extract Fields From

1.3 Install Dependencies

Step 2: Submit a Document With a Schema

2.1 Set Up the Script

2.2 Define the Schema

2.3 Upload the Document

2.4 Capture the Job ID

Step 3: Poll for Results

3.1 Write the Polling Loop

3.2 Save the Extracted Fields

Error Responses

HTTP Errors

Failed Jobs

Response Shape

Sample Output

Next Steps