COMBINATORBacked by Y Combinator

Document layer for Enterprise AI

We turn PDFs, images, and spreadsheets into JSON and Markdown your LLMs and AI agents can reason over. Built on our proprietary dual-stream vision models.

BLACKSTONE REAL ESTATE ADVISORS  U.S. Residential Market Report   |   Q4 2024 U.S. Housing Market Period: 2020–2024   Coverage: 8 Major U.S. Metro Markets   Date: December 20, 2024 Executive Summary Key Observations U.S. residential real estate prices have risen sharply across all major metros between 2020 and 2024, driven by pandemic-era migration, persistently low housing inventory, and the long-term impact of remote work on buyer preferences. Austin recorded the steepest appreciation at +66%, while traditionally expensive markets like New York (+24%) and Los Angeles (+21%) saw more moderate gains. Mortgage rates above 7% have dampened transaction volumes significantly, but prices have remained resilient due to supply constraints. First-time buyer affordability has deteriorated to its worst level in over 30 years. — Austin's 66% price surge — the highest in this cohort — now shows early signs of cooling as remote-work demand normalizes and inventory slowly recovers. — Boston and Seattle maintain the shortest days-on-market (16 and 18 days respectively), reflecting structural undersupply relative to demand. Median Home Price by City — 2020 vs 2024 New York Los Angeles Miami Chicago Austin Seattle Boston Denver 0 200 400 600 800 2020 2024 City-Level Market Indicators New York, NY $720k $890k +24% 38 $820 Seller's Market Seller's Market Seller's Market Moderating Hot Market Balanced Cooling Down 38 $710 38 $530 38 $240 38 $310 38 $560 38 $690 +21% +49% +19% +66% +12% +17% $680k $820k $310k $610k $295k $370k $580k $490k $630k $650k $390k $740k Los Angeles, CA Miami, FL Chicago, IL Austin, TX Seattle, WA Boston, MA City Price 2020 Price 2024 Growth Avg. Mkt Price/Sqft Market Status BLACKSTONE REAL ESTATE ADVISORS  U.S. Residential Market Report   |   Q4 2024 U.S. Housing Market Period: 2020–2024   Coverage: 8 Major U.S. Metro Markets   Date: December 20, 2024 Executive Summary Key Observations U.S. residential real estate prices have risen sharply across all major metros between 2020 and 2024, driven by pandemic-era migration, persistently low housing inventory, and the long-term impact of remote work on buyer preferences. Austin recorded the steepest appreciation at +66%, while traditionally expensive markets like New York (+24%) and Los Angeles (+21%) saw more moderate gains. Mortgage rates above 7% have dampened transaction volumes significantly, but prices have remained resilient due to supply constraints. First-time buyer affordability has deteriorated to its worst level in over 30 years. — Austin's 66% price surge — the highest in this cohort — now shows early signs of cooling as remote-work demand normalizes and inventory slowly recovers. — Boston and Seattle maintain the shortest days-on-market (16 and 18 days respectively), reflecting structural undersupply relative to demand. Median Home Price by City — 2020 vs 2024 New York Los Angeles Miami Chicago Austin Seattle Boston Denver 0 200 400 600 800 2020 2024 City-Level Market Indicators New York, NY $720k $890k +24% 38 $820 Seller's Market Seller's Market Seller's Market Moderating Hot Market Balanced Cooling Down 38 $710 38 $530 38 $240 38 $310 38 $560 38 $690 +21% +49% +19% +66% +12% +17% $680k $820k $310k $610k $295k $370k $580k $490k $630k $650k $390k $740k Los Angeles, CA Miami, FL Chicago, IL Austin, TX Seattle, WA Boston, MA City Price 2020 Price 2024 Growth Avg. Mkt Price/Sqft Market Status BLACKSTONE REAL ESTATE ADVISORS  U.S. Residential Market Report   |   Q4 2024 U.S. Housing Market Period: 2020–2024   Coverage: 8 Major U.S. Metro Markets   Date: December 20, 2024 Executive Summary Key Observations U.S. residential real estate prices have risen sharply across all major metros between 2020 and 2024, driven by pandemic-era migration, persistently low housing inventory, and the long-term impact of remote work on buyer preferences. Austin recorded the steepest appreciation at +66%, while traditionally expensive markets like New York (+24%) and Los Angeles (+21%) saw more moderate gains. Mortgage rates above 7% have dampened transaction volumes significantly, but prices have remained resilient due to supply constraints. First-time buyer affordability has deteriorated to its worst level in over 30 years. — Austin's 66% price surge — the highest in this cohort — now shows early signs of cooling as remote-work demand normalizes and inventory slowly recovers. — Boston and Seattle maintain the shortest days-on-market (16 and 18 days respectively), reflecting structural undersupply relative to demand. Median Home Price by City — 2020 vs 2024 New York Los Angeles Miami Chicago Austin Seattle Boston Denver 0 200 400 600 800 2020 2024 City-Level Market Indicators New York, NY $720k $890k +24% 38 $820 Seller's Market Seller's Market Seller's Market Moderating Hot Market Balanced Cooling Down 38 $710 38 $530 38 $240 38 $310 38 $560 38 $690 +21% +49% +19% +66% +12% +17% $680k $820k $310k $610k $295k $370k $580k $490k $630k $650k $390k $740k Los Angeles, CA Miami, FL Chicago, IL Austin, TX Seattle, WA Boston, MA City Price 2020 Price 2024 Growth Avg. Mkt Price/Sqft Market Status BLACKSTONE REAL ESTATE ADVISORS  U.S. Residential Market Report   |   Q4 2024 U.S. Housing Market Period: 2020–2024   Coverage: 8 Major U.S. Metro Markets   Date: December 20, 2024 Executive Summary Key Observations U.S. residential real estate prices have risen sharply across all major metros between 2020 and 2024, driven by pandemic-era migration, persistently low housing inventory, and the long-term impact of remote work on buyer preferences. Austin recorded the steepest appreciation at +66%, while traditionally expensive markets like New York (+24%) and Los Angeles (+21%) saw more moderate gains. Mortgage rates above 7% have dampened transaction volumes significantly, but prices have remained resilient due to supply constraints. First-time buyer affordability has deteriorated to its worst level in over 30 years. — Austin's 66% price surge — the highest in this cohort — now shows early signs of cooling as remote-work demand normalizes and inventory slowly recovers. — Boston and Seattle maintain the shortest days-on-market (16 and 18 days respectively), reflecting structural undersupply relative to demand. Median Home Price by City — 2020 vs 2024 New York Los Angeles Miami Chicago Austin Seattle Boston Denver 0 200 400 600 800 2020 2024 City-Level Market Indicators New York, NY $720k $890k +24% 38 $820 Seller's Market Seller's Market Seller's Market Moderating Hot Market Balanced Cooling Down 38 $710 38 $530 38 $240 38 $310 38 $560 38 $690 +21% +49% +19% +66% +12% +17% $680k $820k $310k $610k $295k $370k $580k $490k $630k $650k $390k $740k Los Angeles, CA Miami, FL Chicago, IL Austin, TX Seattle, WA Boston, MA City Price 2020 Price 2024 Growth Avg. Mkt Price/Sqft Market Status BLACKSTONE REAL ESTATE ADVISORS  U.S. Residential Market Report   |   Q4 2024 U.S. Housing Market Period: 2020–2024   Coverage: 8 Major U.S. Metro Markets   Date: December 20, 2024 Executive Summary Key Observations U.S. residential real estate prices have risen sharply across all major metros between 2020 and 2024, driven by pandemic-era migration, persistently low housing inventory, and the long-term impact of remote work on buyer preferences. Austin recorded the steepest appreciation at +66%, while traditionally expensive markets like New York (+24%) and Los Angeles (+21%) saw more moderate gains. Mortgage rates above 7% have dampened transaction volumes significantly, but prices have remained resilient due to supply constraints. First-time buyer affordability has deteriorated to its worst level in over 30 years. — Austin's 66% price surge — the highest in this cohort — now shows early signs of cooling as remote-work demand normalizes and inventory slowly recovers. — Boston and Seattle maintain the shortest days-on-market (16 and 18 days respectively), reflecting structural undersupply relative to demand. Median Home Price by City — 2020 vs 2024 New York Los Angeles Miami Chicago Austin Seattle Boston Denver 0 200 400 600 800 2020 2024 City-Level Market Indicators New York, NY $720k $890k +24% 38 $820 Seller's Market Seller's Market Seller's Market Moderating Hot Market Balanced Cooling Down 38 $710 38 $530 38 $240 38 $310 38 $560 38 $690 +21% +49% +19% +66% +12% +17% $680k $820k $310k $610k $295k $370k $580k $490k $630k $650k $390k $740k Los Angeles, CA Miami, FL Chicago, IL Austin, TX Seattle, WA Boston, MA City Price 2020 Price 2024 Growth Avg. Mkt Price/Sqft Market Status
PARSING THE DOCUMENT... TABLE GRAPH TABLE HEADER TEXT FORM REVENUE TREND Quarter 1 $1.2M Quarter 2 $1.35M 02.14 Apex $4,200 02.16 Nova $1,180 02.18 Orion $980 TRANSACTIONS DATE: VENDOR: AMOUNT: DOCUMENT INFO PROCESSED: Financial_report.pdf TYPE: Statement STATUS: Processed ENTITIES VENDOR: CONFIDENCE: Apex 98% Nova 96% Orion 95% EXPENSE BREAKDOWN Logistics 38%% Materials 27% Operations 21% DOCUMENT SUMMARY Quarterly report indicating steady growth across operational divisions with revenue increasing year-over-year.
Built by a team
from
Mercedes-Benz
Honeywell
IIT Kharagpur
Massachusetts Institute of Technology
Y Combinator
Mercedes-Benz
Honeywell
IIT Kharagpur
Massachusetts Institute of Technology
Y Combinator
[ PROBLEM STATEMENT ]

Enterprises run on documents, not databases.

[1]
80%

of enterprise data is unstructured. Most of it sits in PDFs, scans, and spreadsheets your LLMs can't read.

[2]
6+ months

AI teams spend stitching parsers, OCR, and post-processing. Pipelines break the moment a layout changes.

[3]
<10%

of in-house document pipelines reach production. The rest stall in the pilot.

LLMs exploded the use cases for unstructured data. Agents underwrite claims, copilots draft credit memos, RAG retrieves across thousand-page filings. Generic OCR and DIY pipelines are too static. You need a more dynamic interface, so we rebuilt the stack with vision models that read documents the way humans do.

Learn more
[ CORE CAPABILITIES ]

Three capabilities.
One document layer.

Parse, extract, and split. Use them standalone or chain them end-to-end. The same API runs a quick prototype and a production pipeline at scale.

financial_report_q4.pdf
REVENUE TREND
Q1$1.2M
Q2$1.35M
Q3$1.4M
Q4$1.67M

The quarterly report highlights consistent growth across all divisions...

SCANNING & PARSING...
Detected: 4 tables · 2 figures · 847 text tokens

Parse

Convert PDFs, scans, and images into LLM-ready Markdown. Vision models read text, tables, figures, and hierarchy in a single pass, preserving structure that OCR loses.

EXTRACTING...
[ RAW INPUT ]
Vendor:Apex Industries
Date:02.14
Amount:$4,200
Invoice #:INV-0042
NET 30
[ EXTRACTED OUTPUT ]

"vendor": "Apex Industries",

"date": "2024-02-14",

"amount": 4200,

"currency": "USD"

"invoice_id": "INV-0042"

"payment_terms": "NET_30",

"confidence": 0.98

Extract

Pull the fields you need into JSON. One schema, every layout, with domain-awareness that knows a freight charge isn't a line item.

Split

Break multi-document files into individual docs and long ones into retrievable chunks. Parent-child indexing keeps clauses with their preambles.

mixed_documents.pdf
p.01
INVOICE
p.02
CONTRACT
p.03
INVOICE
p.04
RECEIPT
p.05
CONTRACT
p.06
RECEIPT
[ INVOICES ]2 files
invoice_q3_001.pdf
invoice_q3_002.pdf
[ CONTRACTS ]2 files
contract_q3_001.pdf
contract_q3_002.pdf
[ RECEIPTS ]2 files
receipt_q3_001.pdf
receipt_q3_002.pdf
[ ARCHITECTURE DEEPDIVE ]

How the document layer is built.

Attention-guided Heatmaps

Reads pages like a human, breaking them into typed regions: tables, figures, signatures, handwriting. Attention-guided heatmaps focus compute on pivot zones: numerical columns, merged cells, section headers. Chunking respects these pivots, so multi-page tables stay whole, split rows rejoin, and clauses keep their hierarchy.

Dual-stream vision model

Two streams process the document in parallel. A data stream captures tokens, numbers, and entities, while a layout stream captures image tokens, bounding boxes, alignment, and indentation hierarchy. A cross-attention layer lets the model reason over content and structure together.

Domain-specific decoder

The decoder learns each domain's native ontology, whether it's a legal contract, a financial report, a healthcare record, or a regulatory filing. We trained it on millions of real enterprise documents across these domains, not just synthetic data. Outputs are schema-conditioned with cross-field constraints, so totals match line items and references resolve.

[ HOW IT WORKS ]

From raw document to
LLM-ready data.

Connect any source.
Run parse, extract, or split.
Ship to production.

S3, SharePoint, Drive, Snowflake, your DMS. We sit on top of where your data already lives.

Configure schemas, prompts, and confidence thresholds. Or chain all three.

[Extractor]
[Splitter]
[Parser]

[Unsiloed AI.]

Parsing data...

JSON, Markdown, or structured fields into your LLM, AI Agents, vector DB, or warehouse.

[ STRUCTURED OUTPUT ]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
▚█▞▓░▒█▙ ▓▚▚▙▟▙ ▟▟▚▓▟▟▓▟
▓▒▙░▓▞▟▟▟ ▙▙█ ▞▒░█░▞
▒▓▚█▓▚▒ ▚█▟▓██▚▚▚░▒▟▞
░▚▒▙▚░░ █
██▒░▟▞ ▓▒▞▞ ▟▞▙▙ █▙▓ ▞▙▓ ▙▟▒▚ █▟▒▞▟░▒▟
▞▒▒▒░▓ ▙█▒▟ ░▒▒▙ ▙▙▒ ▓▚▞ ░▙▞▟ █▓▙█▙▙▓▙
▒▙▒▙▒▚ ▙▞█▟ ▞█▞▟ ▟▚▚ ▚▓▚ ▞▒█▓ ▙▞▒▚▙░▒
█░▙▟▓▚ ░▓▟▞ ▓█▙▞ ▟░▞ ██▞ █░█░ ▞▙█▚▓▞▟
▙▟▙▙▞░ █▒▚▚ ▓▟█▞ ▙▞▓ ▓░▙ ░█▞▟ ▒░▙▒▒▙▓▚
▞▟▓░▒░ █▞▙▞ ░▞▚▒ ▞█░ ▒▓▚ ▒▙█▞ ▒▙▓█▞▓▚█
▓▓░▚▙█ ▞▒▒░ ▓▞▚▒ ▟░░ ▚▞░ ▟██▙ ░▞░▓█
▞░
▒█▙▚▒▟█▟▙▙ █
░▚█▓░▟▟ ▟░▟▚▞█ ▞▓▓▒▟░ ▞▒▙░▙
░▚▞█▟░ ▟▙▙▚▒░ ░▙▞▓
[ TESTIMONIALS ]

Hear what people say

The accuracy, especially on tables, is meaningfully better than anything we tested. We evaluated over 15 solutions and Unsiloed was the only one that worked reliably.

Head of AIFortune 150 Bank, NY

Unsiloed handled edge cases that consistently broke other systems, particularly the ones around nested tables and different formats.

CTOSeries A fintech

Mortgage servicing documents are extremely complex, but Unsiloed was the only system that parsed them with high accuracy. Confidence scoring and flagging reduced hours of manual review for our team.

CEOLarge Mortgage Servicer
[ FAQS ]

Frequently
Asked

  • Unsiloed processes a wide range of formats including PDFs, images, spreadsheets, and scanned documents. It can handle mixed layouts such as tables, charts, forms, and handwritten content within a single file.

  • Traditional OCR returns a flat stream of text. Unsiloed uses vision models that understand structure — tables stay as tables, sections stay grouped, and the output preserves the parent-child relationships your downstream LLMs need.

  • Clean, LLM-ready Markdown and structured JSON, with confidence scores per field. Fully schema-validated outputs are available for extraction tasks where you need a guaranteed shape.

  • 97.4% table extraction accuracy on our internal benchmark vs ~61% for industry alternatives. Accuracy scales with the complexity of the schema and is reported per-field via confidence scores.

  • Yes. We support both managed and air-gapped deployments. The same API and outputs work in either mode, so the integration code stays identical.

  • Usage-based for managed deployments and seat-based for self-hosted. Talk to us about volume — we have custom pricing for high-volume document pipelines.

Ready to build the
software of the future?

Schedule a Personalized Demo to Explore All Features