ElevenLabs Is Putting a Lifelike Voice Behind Every API Call in 70 Languages

The London startup crossed $330M ARR and raised a $500M Series D from Sequoia, with an IPO targeted in two to three years.

By Bash Okafor

About ElevenLabs

Published 2026-04-20T04:30:55.838Z

When ElevenLabs co-founder and CEO Mati Staniszewski says the company's mission is to "solve the Speech Turing Test" and build what he calls Emotional General Intelligence [ElevenLabs], it sounds like a research lab's manifesto. The financials suggest something else is happening underneath. ElevenLabs told TechCrunch it crossed $330 million in annualized revenue last year [TechCrunch, Jan 2026], up from roughly $120 million at the end of 2024 [Sacra]. That is a roughly 2.75x jump in twelve months, on a product that mostly bills through an API.

The London-based company, founded in 2022 by Staniszewski and Piotr Dbkowski, sells text-to-speech, voice cloning, dubbing, and a newer voice-agents stack that lets developers wire conversational AI into phone lines and apps [Crunchbase]. Its flagship Eleven v3 model covers more than 70 languages and a library of 5,000-plus voices [ElevenLabs]. The wedge is quality: in side-by-side listening tests, ElevenLabs' output has been the model most often mistaken for a human reader, which is why studios, publishers, and enterprise developers have written it into production pipelines.

The customer list reflects that. ElevenLabs counts The Walt Disney Studios, HarperCollins, Bertelsmann, Revolut, Deutsche Telekom, Nvidia, and Meta among its enterprise users [ElevenLabs; catalaize; mvp.vc]. Disney's relationship is deep enough that ElevenLabs went through the Disney Accelerator. The combination of a media-and-entertainment book of business (audiobooks, dubbing, voiceover) and a developer-platform book (voice agents, IVR replacement, in-app narration) is unusual. Most voice AI companies pick a lane.

The bet

Staniszewski has been blunt that the underlying audio models will commoditize over time [TechCrunch, Oct 2025]. That framing matters because it tells you where ElevenLabs thinks the moat is going: not the raw model weights, but the surrounding system. That means the voice library and licensing layer (including deals with rights holders for the AI music product launched in August 2025 [Bloomberg, Aug 2025]), the safety tooling (the AI Speech Classifier for detecting synthetic audio [ElevenLabs]), the latency and reliability of the inference stack, and the agent orchestration layer that turns a voice into a working call center.

It is a sensible read of where value accrues in an AI category once the model gap narrows. Cartesia, PlayHT, and Hume are all credible competitors on the model layer, and the open-source frontier is moving. ElevenLabs' answer is to be the place an enterprise buyer goes when they need a voice that is licensed, monitored, multilingual, low-latency, and supported by an actual sales and solutions team.

The capital backing this thesis

The capital backing this thesis is significant. ElevenLabs closed a $180 million Series C in January 2025 at a reported $3.3 billion valuation [SiliconANGLE, Jan 2025], then raised a $500 million Series D in February 2026 led by Sequoia Capital at roughly $11 billion [Reuters, Feb 2026; TechCrunch, Feb 2026; Sifted, Feb 2026]. The cap table includes Andreessen Horowitz, ICONIQ Growth, NEA, NVIDIA, and the angel pairing of Nat Friedman and Daniel Gross. Bloomberg reported in March 2026 that the company is positioning to be IPO-ready within two to three years [Bloomberg, Mar 2026].

Series A 2023 | 19 | $M
Series B 2024 | 80 | $M
Series C 2025 | 180 | $M
Series D 2026 | 500 | $M

The market shape helps. Voice is one of the few AI categories with an obvious enterprise budget line already attached: dubbing studios, contact centers, IVR vendors, audiobook producers, and accessibility tooling all have existing spend that an ElevenLabs API call can displace. Voice agents in particular sit on top of a contact-center software market measured in the tens of billions of dollars annually. If ElevenLabs converts even a single-digit share of that into API revenue, the $330M ARR figure is a waypoint, not a ceiling.

Team and traction

Staniszewski and Dbkowski are childhood friends from Poland. Staniszewski was previously at Palantir, and Dbkowski was a research engineer at Google DeepMind [Forbes]. The technical credibility shows up in the product cadence: Eleven v3, the conversational AI agents stack, the music service, and the Impact Program providing free licenses for accessibility cases like ALS voice restoration [ElevenLabs]. Headcount estimates from third-party trackers vary widely, with figures ranging from the mid-300s to 600 across PitchBook, TrueUp, and StartupHub.ai, which is consistent with a company hiring quickly across geographies.

The honest counterfactual

The bear case is the one Staniszewski himself named: model commoditization [TechCrunch, Oct 2025]. If open-weights voice models reach 90% of ElevenLabs' quality at a fraction of the cost, the price umbrella collapses and the API business compresses to margin on inference plus a thin services layer. There is also the safety overhang. Voice cloning from a one-minute sample [SiliconANGLE, Jan 2025] is a powerful capability and a continuous regulatory and reputational exposure. The bull answer is twofold. First, the enterprise contracts ElevenLabs has signed (Disney, HarperCollins, Deutsche Telekom) are not bought on model quality alone; they are bought on licensing, indemnification, content moderation, and uptime, all of which favor an incumbent with a sales motion. Second, ElevenLabs has shipped safety infrastructure (the AI Speech Classifier, the Impact Program governance) earlier than most peers [ElevenLabs], which gives it a credible posture in front of regulators considering synthetic-media rules.

What to watch

The next twelve months should clarify three things: whether the voice-agents product becomes a meaningful revenue line distinct from text-to-speech, whether the music service signs the kind of label and publisher deals that would make it defensible [Bloomberg, Aug 2025], and whether ARR growth holds its current slope as the comparison base gets harder. The IPO timeline Bloomberg reported [Bloomberg, Mar 2026] implies the company believes its 2026 and 2027 numbers will support public-market scrutiny.

Technical breakdown

ElevenLabs sells inference on proprietary speech models exposed through REST and WebSocket APIs and SDKs [ElevenLabs]. The current flagship, Eleven v3, supports 70-plus languages with prosody and multi-speaker control [ElevenLabs]. Voice cloning runs from a one-minute sample for a basic clone or 30 minutes for a professional replica [SiliconANGLE, Jan 2025]. The conversational agents product layers turn-taking, interruption handling, and tool use on top of the speech stack so developers can build phone-grade agents without stitching together a separate ASR, LLM, and TTS pipeline. Pricing is metered by characters and minutes with enterprise tiers for custom volume [ElevenLabs].

What could go wrong at scale

Three risks compound. Inference economics: voice agents at human-grade latency are expensive to run, and gross margins on real-time conversational workloads can be materially worse than batch text-to-speech, which would pressure the unit economics implied by the $330M ARR figure [TechCrunch, Jan 2026]. Concentration: a handful of large media and telecom contracts likely contribute outsized revenue, and any one of them in-housing a model would dent growth. Regulation: any high-profile misuse of voice cloning, even by a third party using a competitor's tool, could prompt rules that raise compliance costs across the category and slow the enterprise sales cycle ElevenLabs depends on. None of those are fatal. All of them are reasons the path from $330M ARR to a defensible public company is harder than the funding round headlines suggest.

Read on Startuply.vc