Element Types - Unsiloed AI

Every segment in a parsed document carries a segment_type field naming the layout region it came from. The parser recognizes the types listed below, divided into text elements (regions whose meaning lives in their characters and structure) and visual elements (regions whose meaning lives in their layout, image content, or rendered form). Two of them (KeyValuePair and Signature) only appear when you submit with layout_analysis=advanced_layout_detection. All segments share the same core fields: bbox, confidence, content, markdown, html, ocr, and location metadata. What changes by type is what those fields contain, and a couple of types omit specific fields entirely. The sections below show a real response sample for each type.

Text Elements

These segments carry their meaning in text, so the markdown and html fields use semantic markup like headers, italics, list syntax, and footnote references to reflect each type.

Text

Regular paragraph and inline text. The content field carries the plain text, markdown is the same with line breaks preserved, and html wraps any line breaks in <br/>.

{
  "segment_id": "13d55851-d0fc-4999-a508-ab82d9a64443",
  "segment_type": "Text",
  "content": "The following table summarises regional sales performance for Q1\n2024.",
  "markdown": "The following table summarises regional sales performance for Q1\n2024.",
  "html": "The following table summarises regional sales performance for Q1<br />\n2024.",
  "bbox": { "left": 60.8, "top": 152.6, "width": 714.6, "height": 24.3 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 0.99
}

Title

Document titles and main headings. Rendered as a top-level Markdown header (#) and <h1> in HTML, distinct from SectionHeader which uses ##/<h2>.

{
  "segment_id": "24063562-721c-4122-87f2-c376ac0296f0",
  "segment_type": "Title",
  "content": "BERKSHIRE HATHAWAY INC.",
  "markdown": "# BERKSHIRE HATHAWAY INC.",
  "html": "<h1>BERKSHIRE HATHAWAY INC.</h1>",
  "bbox": { "left": 308.5, "top": 625.8, "width": 761.0, "height": 70.5 },
  "page_number": 1,
  "page_width": 1224.0,
  "page_height": 1576.0,
  "confidence": 0.49
}

SectionHeader

Section titles and subheadings that define the document’s hierarchy. The parser renders these as ## in markdown and <h2> in html.

{
  "segment_id": "034a37e7-6e4b-45dd-802c-e648d6c16498",
  "segment_type": "SectionHeader",
  "content": "Q1 2024 Sales Report",
  "markdown": "## Q1 2024 Sales Report",
  "html": "<h2>Q1 2024 Sales Report</h2>",
  "bbox": { "left": 427.6, "top": 67.8, "width": 344.7, "height": 36.5 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 0.35
}

ListItem

Bulleted and numbered list entries. The markdown field renders the item with a leading dash, and html wraps the entry in <ul> (with a nested <ol> if the source list was numbered).

{
  "segment_id": "32217825-2ddc-4121-be70-2b3e23e2ab97",
  "segment_type": "ListItem",
  "content": "1. Operating Conditions — 1966",
  "markdown": "- 1. Operating Conditions — 1966",
  "html": "<ul><li><ol>\n<li>Operating Conditions — 1966</li>\n</ol></li></ul>",
  "bbox": { "left": 441.9, "top": 430.0, "width": 288.8, "height": 20.7 },
  "page_number": 3,
  "page_width": 1222.0,
  "page_height": 1576.0,
  "confidence": 0.95
}

Caption

Text captions associated with images, figures, or tables. The markdown field wraps the caption in italics (_..._), and html wraps it in a <span class="caption"> for downstream styling.

{
  "segment_id": "63d646e8-0cb1-4325-8759-86625a51b0f9",
  "segment_type": "Caption",
  "content": "Figure 1: The Transformer - model\narchitecture.",
  "markdown": "_Figure 1: The Transformer - model\narchitecture._",
  "html": "<span class=\"caption\">Figure 1: The Transformer - model<br />\narchitecture.</span>",
  "bbox": { "left": 418.8, "top": 808.4, "width": 385.1, "height": 21.0 },
  "page_number": 3,
  "page_width": 1224.0,
  "page_height": 1584.0,
  "confidence": 1.0
}

Footnote

Footnote text and references. The markdown field uses Markdown footnote syntax ([^...]), and html wraps the body in a <span class="footnote">.

{
  "segment_id": "840069a9-fb5e-4dcb-ad37-459bd4ff29f1",
  "segment_type": "Footnote",
  "content": "∗Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention...",
  "markdown": "[^∗Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention...]",
  "html": "<span class=\"footnote\">∗Equal contribution. Listing order is random...</span>",
  "bbox": { "left": 214.4, "top": 1196.5, "width": 793.9, "height": 176.2 },
  "page_number": 1,
  "page_width": 1224.0,
  "page_height": 1584.0,
  "confidence": 1.0
}

PageHeader

Header content at the top of a page, such as library stamps, document titles repeating across pages, or running headers. The markdown and html fields carry the raw text without semantic markup. Often worth filtering out for clean RAG ingestion.

{
  "segment_id": "0f1fce17-da69-4698-b7a6-3abf954ee41e",
  "segment_type": "PageHeader",
  "content": "CLEVELAND PUBLIC LIBRARY BUSINESS INF. BUR.\nCORPORATION FILE",
  "markdown": "CLEVELAND PUBLIC LIBRARY BUSINESS INF. BUR.\nCORPORATION FILE",
  "html": "CLEVELAND PUBLIC LIBRARY BUSINESS INF. BUR.<br />\nCORPORATION FILE",
  "bbox": { "left": 998.3, "top": 2.6, "width": 193.5, "height": 76.7 },
  "page_number": 1,
  "page_width": 1224.0,
  "page_height": 1576.0,
  "confidence": 0.34
}

PageFooter

Footer content at the bottom of a page, typically page numbers, copyright notices, or document IDs. Like PageHeader, often filtered out before embedding.

{
  "segment_id": "997cbb53-0d58-49dc-b10e-e997ee14aafc",
  "segment_type": "PageFooter",
  "content": "5",
  "markdown": "5",
  "html": "5",
  "bbox": { "left": 611.0, "top": 1410.1, "width": 12.4, "height": 21.6 },
  "page_number": 7,
  "page_width": 1224.0,
  "page_height": 1576.0,
  "confidence": 0.94
}

KeyValuePair

A labeled field in a form or document, like Passport No : or Invoice Date:. Only returned under layout_analysis=advanced_layout_detection. The label is captured in this segment; the value typically appears as a separate adjacent Text segment. The html field wraps the label in a <div class="key-value-pair"> so downstream code can style or pair it.

{
  "segment_id": "85fcc4f1-f637-4d57-a698-c4bc4bfb0d3e",
  "segment_type": "KeyValuePair",
  "content": "Passport No :",
  "markdown": "Passport No :",
  "html": "<div class=\"key-value-pair\">Passport No :</div>",
  "bbox": { "left": 95.5, "top": 217.9, "width": 110.4, "height": 22.2 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 1.0
}

Visual Elements

These segments carry their meaning in visual content or layout. The markdown and html fields contain either rendered structured content (Markdown tables, LaTeX) or AI-generated descriptions for image regions.

Table

Tabular data with structured rows and columns. The markdown field carries the Markdown pipe-table syntax, html carries a full <table> with <thead> and <tbody>, and content is a flat plain-text approximation. The image field contains a signed URL to a cropped image of the table region, useful for verifying parses visually or feeding the original table to an image-input model.

{
  "segment_id": "4f4b54bc-793e-49cc-b0a3-113bbb5484be",
  "segment_type": "Table",
  "content": "Region Sales Rep Units Sold Revenue ($) Target ($) % of Target\nNorth Alice Brown 1,240 186,000 175,000 106%\n...",
  "markdown": "| Region | Sales Rep | Units Sold | Revenue ($) | Target ($) | % of Target |\n| --- | --- | --- | --- | --- | --- |\n| North | Alice Brown | 1,240 | 186,000 | 175,000 | 106% |\n| ... | ... | ... | ... | ... | ... |",
  "html": "<table>\n  <thead>\n    <tr><th>Region</th><th>Sales Rep</th><th>Units Sold</th>...</tr>\n  </thead>\n  <tbody>\n    <tr><td>North</td><td>Alice Brown</td>...</tr>\n    ...\n  </tbody>\n</table>",
  "image": "https://s3.us-east-1.amazonaws.com/...",
  "bbox": { "left": 54.4, "top": 208.5, "width": 1026.5, "height": 246.5 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 0.99
}

Picture

Images, charts, illustrations, and diagrams. The image field contains a signed URL to the cropped picture itself. The markdown and html fields contain an AI-generated description of the image (not the image bytes), making the picture’s visual content searchable and embeddable as text alongside the rest of the document.

{
  "segment_id": "c60d89b1-373e-428d-9950-544e7c903b61",
  "segment_type": "Picture",
  "markdown": "# Image Description\n\nThe image shows a **large orange sombrero** against a **plain white background**. The hat has a tall, rounded crown and a very broad brim...",
  "html": "<h1>Image Description</h1><p>The image shows a <strong>large orange sombrero</strong>...</p>",
  "image": "https://s3.us-east-1.amazonaws.com/...",
  "bbox": { "left": 0.5, "top": -1.0, "width": 1748.5, "height": 1166.9 },
  "page_number": 2,
  "page_width": 1732.0,
  "page_height": 2262.0,
  "confidence": 0.93
}

Formula

Mathematical equations and expressions. The most distinctive type: the markdown and html fields contain LaTeX wrapped in $...$ , ready to render with KaTeX, MathJax, or any other LaTeX-aware tool. The content field carries a plain-text OCR approximation of the equation, which is usually less reliable than the LaTeX representation.

{
  "segment_id": "b4ccd4cf-01ae-4e92-b881-3bb1c335e8b3",
  "segment_type": "Formula",
  "content": "V ) = softmax(QKT )V (1)\nAttention(Q, K, √\ndk",
  "markdown": "$\\mathrm{Attention}(Q, K, V) = \\mathrm{softmax}\\left(\\frac{QK^T}{\\sqrt{d_k}}\\right)V$",
  "html": "<p>$\\mathrm{Attention}(Q, K, V) = \\mathrm{softmax}\\left(\\frac{QK^T}{\\sqrt{d_k}}\\right)V$</p>",
  "bbox": { "left": 438.4, "top": 928.8, "width": 569.9, "height": 52.7 },
  "page_number": 4,
  "page_width": 1224.0,
  "page_height": 1584.0,
  "confidence": 1.0
}

Signature

A handwritten signature region. Only returned under layout_analysis=advanced_layout_detection. Like Picture, the markdown and html fields contain an AI-generated description of what the handwriting looks like, useful as searchable text. Unlike Picture, a Signature segment carries no content field and no image URL, only the description and bounding box.

{
  "segment_id": "b328b32d-1c37-41f2-a5f7-0366a870d238",
  "segment_type": "Signature",
  "markdown": "## Image Description\n\nThe image shows a handwritten word in dark ink on a light background.\n\n### Visible Text\n- **Dhote.**\n\n### Details\n- The handwriting is cursive and slightly slanted...",
  "html": "<h2>Image Description</h2>\n<p>The image shows a handwritten word in dark ink on a light background.</p>\n<h3>Visible Text</h3>\n<ul><li><strong>Dhote.</strong></li></ul>...",
  "bbox": { "left": 96.7, "top": 1398.4, "width": 84.2, "height": 50.8 },
  "page_number": 1,
  "page_width": 1191.0,
  "page_height": 1684.0,
  "confidence": 1.0
}

For the full segment shape and configuration options, see the Parse API reference.

​Text Elements

​Text

​Title

​SectionHeader

​ListItem

​Caption

​Footnote

​PageHeader

​PageFooter

​KeyValuePair

​Visual Elements

​Table

​Picture

​Formula

​Signature