The least glamorous bottleneck in enterprise AI is not the model. It is the 400-page PDF that someone scanned crooked in 2009, and the analyst who has to retype the table on page 217 into a spreadsheet before the model gets to see it.
Doctly AI, a 2025-vintage company in Saratoga, California, is selling a way out of that room. Its pitch is straightforward: feed it a complex PDF, get back Markdown, JSON, or CSV through an API, and skip the manual data entry [doctly.ai, 2026]. The company is angel-funded, with the round size undisclosed [Crunchbase, 2026].
The wedge is the ugly PDF
Doctly's product targets the documents that defeat conventional OCR: regulatory filings, legal contracts, financial reports, insurance forms, and medical records [doctly.ai, 2026]. These are the files where a misread decimal point is a compliance event, and where the layout (nested tables, footnotes, multi-column text, scanned inserts) is the actual problem.
The company also offers instant custom extractor generation from drag-and-drop uploads, which is the more interesting half of the bet [doctly.ai, 2026]. Generic parsing is a commodity race. A workflow where a non-engineer points at a sample document and gets back a working extractor is closer to a product than a primitive.
Cofounder Ali Sheikh, who lists himself as CEO, has written that the company arrived at PDF parsing somewhat sideways, having started with a retrieval system for regulatory documents and discovered that the parsing layer was the part nobody had solved well enough to build on [Medium, 2026] [X, 2026]. That origin story matters. It explains why the wedge is narrow and why the initial customer profile is regulated-industry developer teams rather than consumer document conversion.
A crowded shelf
The competitive set is the most honest part of the story. LlamaParse ships inside the LlamaIndex ecosystem, which gives it a distribution lane straight into every RAG project on GitHub. Unstructured has raised serious money and built a reputation in enterprise document pipelines. Vectorize is pushing on the same surface from the vector-database side.
| Competitor | Distribution wedge |
|---|---|
| LlamaParse | Bundled with LlamaIndex, the default RAG framework |
| Unstructured | Enterprise document pipelines, well-funded |
| Vectorize | Comes in via the vector database layer |
| Doctly AI | Chat-to-API custom extractors for regulated docs |
Doctly's read appears to be that accuracy on genuinely hard documents, plus a faster path from sample to working extractor, is enough to peel off the buyers who care about a single misread number. That is a defensible thesis on a narrow segment. It is a harder thesis on the broad market, where "good enough" wins and free wins faster.
Where the bet could break
The risks are the ones any pre-traction infrastructure company carries, sharpened by the category.
- No public customers yet. The company has zero G2 reviews and no named deployments in public sources [G2, 2026]. In a developer-tools market where social proof drives the top of the funnel, the silence is real.
- Open-source pressure from below. Mistral, among others, ships strong OCR models that a competent team can wire up in an afternoon. Sheikh has acknowledged the comparison directly in public [Hacker News, 2026]. The wedge has to be the workflow, not the parse.
- Headcount and runway. Crunchbase reports 1 to 10 employees [Crunchbase, 2026]. An angel round buys a window, not a market.
None of these are disqualifying. They are the standard conditions of a 2025 infrastructure company that picked a real problem before it picked a moat.
The math the buyer is doing
The customer is running a quiet calculation in their head. A mid-size compliance team might process, say, 10,000 complex PDFs a year. At fifteen minutes of analyst time per document at a fully loaded $80 an hour, that is roughly 2,500 hours, or about $200,000 a year, spent moving numbers from a PDF into a spreadsheet. If a parsing API can take that down by 80 percent at a software cost of $30,000 to $50,000, the ROI math closes inside a quarter. That is the conversation Doctly needs to be in.
The incumbent it has to beat to stay in that conversation is LlamaParse. LlamaParse is free to start, sits inside the framework most developers reach for first, and is good enough on most documents. Doctly's case has to be that on the documents where good enough costs you a regulator letter, the difference is worth a line item. If that case lands with even a handful of named compliance and legal teams in the next twelve months, the angel round will look cheap.
Sources
- [doctly.ai, 2026] Doctly AI product site | https://doctly.ai/
- [Crunchbase, 2026] Doctly AI Company Profile | https://www.crunchbase.com/organization/doctly-ai
- [Medium, 2026] How We Accidentally Built the AI-Powered PDF Parser, by Ali Sheikh | https://medium.com/@ali.sheikh_64228/how-we-accidentally-built-the-ai-powered-pdf-parser-we-never-knew-we-needed-the-doctly-story-af5e3f88dc8a
- [X, 2026] Ali Sheikh (@RLMLDL) | https://x.com/RLMLDL
- [Hacker News, 2026] Doctly cofounder comment on Mistral OCR | https://news.ycombinator.com/item?id=43283647
- [G2, 2026] Doctly AI seller page | https://www.g2.com/sellers/doctly-ai