segment_type field naming the layout region it came from. The parser recognizes the types listed below, divided into text elements (regions whose meaning lives in their characters and structure) and visual elements (regions whose meaning lives in their layout, image content, or rendered form). Two of them (KeyValuePair and Signature) only appear when you submit with layout_analysis=advanced_layout_detection.
All segments share the same core fields: bbox, confidence, content, markdown, html, ocr, and location metadata. What changes by type is what those fields contain, and a couple of types omit specific fields entirely. The sections below show a real response sample for each type.
Text Elements
These segments carry their meaning in text, so themarkdown and html fields use semantic markup like headers, italics, list syntax, and footnote references to reflect each type.
Text
Regular paragraph and inline text. Thecontent field carries the plain text, markdown is the same with line breaks preserved, and html wraps any line breaks in <br/>.
Title
Document titles and main headings. Rendered as a top-level Markdown header (#) and <h1> in HTML, distinct from SectionHeader which uses ##/<h2>.
SectionHeader
Section titles and subheadings that define the document’s hierarchy. The parser renders these as## in markdown and <h2> in html.
ListItem
Bulleted and numbered list entries. Themarkdown field renders the item with a leading dash, and html wraps the entry in <ul> (with a nested <ol> if the source list was numbered).
Caption
Text captions associated with images, figures, or tables. Themarkdown field wraps the caption in italics (_..._), and html wraps it in a <span class="caption"> for downstream styling.
Footnote
Footnote text and references. Themarkdown field uses Markdown footnote syntax ([^...]), and html wraps the body in a <span class="footnote">.
PageHeader
Header content at the top of a page, such as library stamps, document titles repeating across pages, or running headers. Themarkdown and html fields carry the raw text without semantic markup. Often worth filtering out for clean RAG ingestion.
PageFooter
Footer content at the bottom of a page, typically page numbers, copyright notices, or document IDs. LikePageHeader, often filtered out before embedding.
KeyValuePair
A labeled field in a form or document, likePassport No : or Invoice Date:. Only returned under layout_analysis=advanced_layout_detection. The label is captured in this segment; the value typically appears as a separate adjacent Text segment. The html field wraps the label in a <div class="key-value-pair"> so downstream code can style or pair it.
Visual Elements
These segments carry their meaning in visual content or layout. Themarkdown and html fields contain either rendered structured content (Markdown tables, LaTeX) or AI-generated descriptions for image regions.
Table
Tabular data with structured rows and columns. Themarkdown field carries the Markdown pipe-table syntax, html carries a full <table> with <thead> and <tbody>, and content is a flat plain-text approximation. The image field contains a signed URL to a cropped image of the table region, useful for verifying parses visually or feeding the original table to an image-input model.
Picture
Images, charts, illustrations, and diagrams. Theimage field contains a signed URL to the cropped picture itself. The markdown and html fields contain an AI-generated description of the image (not the image bytes), making the picture’s visual content searchable and embeddable as text alongside the rest of the document.
Formula
Mathematical equations and expressions. The most distinctive type: themarkdown and html fields contain LaTeX wrapped in $...$, ready to render with KaTeX, MathJax, or any other LaTeX-aware tool. The content field carries a plain-text OCR approximation of the equation, which is usually less reliable than the LaTeX representation.
Signature
A handwritten signature region. Only returned underlayout_analysis=advanced_layout_detection. Like Picture, the markdown and html fields contain an AI-generated description of what the handwriting looks like, useful as searchable text. Unlike Picture, a Signature segment carries no content field and no image URL, only the description and bounding box.

