Lemma

Observability platform for self-improving AI agents

PUBLIC


Name	Lemma
Tagline	Observability platform for self-improving AI agents
Headquarters	San Francisco, CA, USA
Founded	2025
Stage	Pre-Seed
Business Model	API / Developer Platform
Industry	Other
Technology	AI / Machine Learning
Geography	North America
Founding Team	Co-Founders (2)
Funding Label	$500K (total disclosed ~$500,000)

Executive Summary

PUBLIC

Lemma is building an observability platform for AI agents, a bet that the next wave of AI adoption will be defined by dynamic, self-improving systems rather than static models [Startup Intros, 2025]. The company, founded in 2025 by ex-AI engineers Jerry Zhang and Cole Gawin, targets the acute debugging and performance management pain points developers face when deploying agents into production environments [Startup Intros, 2025]. Its proposed platform aims to detect performance drifts, pinpoint failure steps, and generate optimized prompts, with the goal of automating improvements through API calls or pull requests [Startup Intros, 2025].

The founding team's background is rooted in technical execution, with both founders having dropped out of USC to pursue the venture and establishing an active, multi-language SDK presence on GitHub prior to any public launch [LinkedIn, 2026][GitHub, 2026]. Lemma is backed by Y Combinator's F25 batch, which provided a $500,000 pre-seed round and serves as the company's primary external validation point to date [Y Combinator & PitchBook, 2026]. The business model is positioned as an API or developer platform, though revenue mechanics and customer traction remain unproven.

Over the next 12-18 months, the key watchpoints are the transition from a GitHub SDK presence to a publicly available product, the acquisition of initial design partners to validate the core observability loop, and the articulation of a clear wedge against established competitors in the broader AI toolchain monitoring space.

Data Accuracy: YELLOW -- Core product claims are sourced from company materials; team and funding details are corroborated by YC and LinkedIn.

Taxonomy Snapshot

Axis	Value
Stage	Pre-Seed
Business Model	API / Developer Platform
Technology Type	AI / Machine Learning
Geography	North America
Founding Team	Co-Founders (2)

Company Overview

PUBLIC Lemma was founded in 2025 by Jerry Zhang and Cole Gawin, two ex-AI engineers who are now based in San Francisco [Startup Intros, 2025] [Y Combinator & LinkedIn, 2026]. The company's formation appears directly tied to the founders' firsthand experience with the challenges of debugging and maintaining AI agents, a pain point they aim to address with their observability platform [Startup Intros, 2025]. Both founders are listed as USC dropouts, though their specific academic timelines are not detailed [LinkedIn, 2026].

The company's early development is marked by its acceptance into the Y Combinator F25 batch, a program that typically provides $500,000 in pre-seed funding [Y Combinator, 2026] [PitchBook, 2026]. This capital injection, which constitutes the only publicly disclosed funding round, aligns with the standard Y Combinator deal terms. A key operational signal is the active development of software development kits, with public GitHub repositories for TypeScript, Python, Go, and a CLI tool under the organization 'uselemma' [GitHub, 2026]. This technical activity suggests a focus on building foundational infrastructure for developers.

As of early 2026, the company remains in a pre-commercial stage. No named customers, deployment case studies, or formal partnership announcements have been made public [Perplexity Sonar Pro, 2025]. The team size is confirmed at two employees, the co-founders themselves, with no open job postings identified [Y Combinator & LinkedIn, 2026]. The company was featured in a Forbes list highlighting top startups from the YC F25 batch, though this appears to be a curated list rather than independent news coverage [LinkedIn (Jerry Zhang), 2026].

Data Accuracy: YELLOW -- Key founding details and YC participation are confirmed, but traction and operational milestones rely on limited or inferred sources.

Product and Technology

MIXED Lemma’s product is described as an observability platform for self-improving AI agents, a claim that frames the problem as one of continuous adaptation rather than static monitoring [Startup Intros, 2025]. According to the company’s public materials, the platform is designed to detect performance drifts, pinpoint specific failure steps within an agent’s workflow, and then generate optimized prompts [Startup Intros, 2025]. The proposed closed-loop system would automate these improvements, delivering them via API or code pull requests to enable what the company calls continuous learning from user feedback and production outcomes [Startup Intros, 2025]. This positions Lemma as a tool for developers building adaptive agents for dynamic environments, where traditional static models might degrade.

Public evidence of the underlying technology is limited but includes an active GitHub organization. The company, listed as Lemma Labs, maintains public SDK repositories for TypeScript, Python, Go, and a CLI [GitHub, 2026]. One repository is described as an OpenTelemetry-based tracing module, which suggests an architectural foundation built on open-source observability standards [GitHub, 2026]. The technical co-founder’s GitHub profile shows activity linking to the company domain, corroborating active development [GitHub, 2026]. The product’s core wedge, as presented, is its aim to turn static AI into self-improving systems with compounding gains, a concept built by founders who are described as ex-AI engineers [Startup Intros, 2025].

No live product demos, detailed technical whitepapers, or public case studies are available to verify the implementation of these features. The platform’s capabilities remain at the conceptual stage based on available sources. The primary public artifacts are the SDKs and the high-level value proposition.

Data Accuracy: YELLOW -- Product claims sourced from company materials and a startup directory; technical activity corroborated by public GitHub repositories.

Market Research

PUBLIC The market for AI agent observability is emerging in response to a fundamental shift: the move from static, single-turn AI models to dynamic, multi-step agents that must operate reliably over time in production environments. This transition creates a new class of debugging and optimization problems that traditional model monitoring tools are not designed to address.

Quantifying the total addressable market (TAM) for this specific niche is challenging due to its nascency. No third-party reports have yet sized the market for "AI agent observability" directly. However, the broader AI developer tools and MLOps platform market provides a relevant analog. The global MLOps platform market was valued at $3.5 billion in 2024 and is projected to grow at a compound annual growth rate of over 40% through 2030, according to industry analysts [Grand View Research, 2024]. The segment for AI application performance monitoring within that market is a smaller, faster-growing slice.

Metric	Value
MLOps Platform Market 2024	3.5 $B
Projected CAGR (2024-2030)	40 %

The projected growth rate suggests a rapidly expanding budget for tools that manage AI in production, but Lemma's specific wedge must be carved from a portion of this spend that is not yet clearly defined.

Demand is driven by several converging tailwinds. The proliferation of AI agents in customer support, coding, and business process automation creates a pressing need for reliability engineering. Agents that fail silently or degrade over time can cause significant operational and financial damage, raising the stakes for observability. Furthermore, the complexity of agentic workflows, which involve chained calls to various models, tools, and APIs, generates a high-dimensional telemetry problem that exceeds the scope of simple endpoint monitoring. A secondary driver is the increasing focus on AI safety and governance in enterprise settings, which requires auditable traces of agent decisions and actions.

Key adjacent markets include general-purpose application performance monitoring (APM), model monitoring and evaluation platforms, and LLM developer tooling. Companies like Datadog and New Relic dominate APM but are only beginning to add LLM-specific features. Pure-play model monitoring platforms like Arize and WhyLabs focus on model input/output drift and performance, but their support for complex, stateful agent workflows is an open question. The most direct substitutes are other LLM developer platforms, such as LangChain's LangSmith, which bundle observability as a feature within a broader framework, potentially reducing the need for a standalone tool.

Regulatory and macro forces are still formative but point toward increased scrutiny. Emerging AI safety frameworks, such as the EU AI Act and the U.S. NIST AI Risk Management Framework, emphasize transparency, accountability, and continuous monitoring of high-risk AI systems. While not yet targeting agents specifically, these regulations create a compliance tailwind for tools that can provide detailed audit trails. A macro risk is the potential for consolidation in the AI toolchain, where larger cloud providers or foundational model companies may bundle observability features, squeezing out independent vendors.

Data Accuracy: YELLOW -- Market sizing is based on an analogous, broader sector report. Demand drivers and competitive context are inferred from industry trends rather than specific, cited research on the agent observability niche.

Competitive Landscape

MIXED Lemma enters a nascent but rapidly formalizing market for AI agent observability, where its early technical focus on automated improvement sets it apart from established players who monitor more mature, static model deployments.

Company	Positioning	Stage / Funding	Notable Differentiator	Source
Lemma	Observability for self-improving AI agents, automating prompt optimization and failure correction.	Pre-Seed, $500K [Y Combinator & PitchBook, 2026]	Closed-loop system for continuous agent improvement via API/PR.	[Startup Intros, 2025]
LangSmith	Full lifecycle platform for developing, monitoring, and testing LLM applications.	Series B, $70M (estimated) [Crunchbase, 2025]	Deep integration with LangChain framework and broad toolchain.	[Crunchbase, 2025]
Arize	ML observability platform for model monitoring, explainability, and evaluation.	Series B, $61M [Crunchbase, 2024]	Strong enterprise focus on traditional ML and LLM drift detection.	[Crunchbase, 2024]
Langfuse	Open-source LLM observability and analytics platform.	Seed, $4.5M [Crunchbase, 2024]	Developer-first, self-hostable alternative with community traction.	[Crunchbase, 2024]
Helicone	Observability, cost tracking, and analytics for LLM applications.	Seed, $3M (estimated) [Crunchbase, 2024]	Lightweight, proxy-based approach for OpenAI and Anthropic models.	[Crunchbase, 2024]

Competition fragments across three distinct segments. The first includes general-purpose ML observability incumbents like Arize and Weights & Biases, which have expanded from traditional model monitoring into LLM evaluation. Their advantage is an existing enterprise sales motion and a broader view of the ML pipeline, but their tooling is often retrofitted for agents rather than built from the ground up for autonomous, iterative systems. The second segment comprises LLM-native developer platforms, led by LangSmith. These are Lemma's most direct comparables, as they are built specifically for the LLM application stack and offer tracing, testing, and evaluation. LangSmith's deep integration with the popular LangChain framework gives it significant distribution lock-in, but its focus remains on the development and monitoring phases, not automated, production-driven optimization. The third segment consists of open-source and lightweight alternatives like Langfuse and Helicone, which compete on cost, transparency, and flexibility, appealing to early-stage teams and those with strong in-house engineering.

Lemma's stated defensible edge is its architectural focus on the feedback loop. While competitors trace and evaluate, Lemma claims to detect drifts, pinpoint failure steps, and generate optimized prompts or code changes automatically [Startup Intros, 2025]. This positions it as a system for closing the observability loop, turning insights into actions without manual intervention. The durability of this edge is questionable at present, as it is a product claim, not a demonstrated technical moat. A more tangible, near-term advantage is the founders' specific background as AI engineers who experienced "agent debugging pain firsthand" [Startup Intros, 2025]. This founder-market fit could translate into a product that resonates with early adopters. The company's active GitHub organization, with SDKs in TypeScript, Python, and Go, suggests a commitment to developer experience that could foster early community adoption [GitHub, 2026].

The company's exposure is multi-faceted. Its most significant vulnerability is the lack of any public traction, while its named competitors have funded rounds, established user bases, and, in some cases, clear revenue streams. LangSmith's distribution advantage via LangChain is a formidable barrier; a developer already invested in that ecosystem has little incentive to add a separate, pre-product tool for a subset of functionality. Lemma is also exposed on the enterprise front. Incumbents like Arize have years of experience selling into regulated industries and integrating with complex data infrastructure, a go-to-market capability Lemma has not yet begun to build. Finally, the company's narrow focus on AI agents is both its wedge and its risk. If the market for sophisticated, autonomous agents develops more slowly than anticipated, or if developers find adequate solutions by cobbling together existing monitoring and fine-tuning tools, Lemma's specialized value proposition may struggle to find a market.

The most plausible 18-month scenario hinges on the pace of agent adoption and Lemma's ability to convert its technical vision into a shipped product. If autonomous agents see rapid enterprise experimentation and Lemma successfully launches a reliable automated improvement engine, it could carve out a defensible niche as the "CI/CD for agents." In this case, the loser would be the open-source alternatives like Langfuse, which may lack the resources to build equivalent automation features. Conversely, if agent development remains focused on simpler, deterministic chains, the broader LLM observability platforms like LangSmith will likely absorb any demand for advanced tooling. They can incrementally add automation features, leveraging their larger user bases and capital. In that scenario, Lemma, without demonstrated traction or a significant funding war chest, would struggle to differentiate and could be sidelined as a feature, not a platform.

Data Accuracy: YELLOW -- Competitor data is sourced from Crunchbase and is reasonably current; Lemma's positioning is based on a single third-party profile [Startup Intros, 2025].

Opportunity

PUBLIC The opportunity for Lemma is to become the foundational observability standard for a future where AI agents, not static models, are the primary unit of production, a shift that would unlock a new, multi-billion-dollar layer of infrastructure.

The headline opportunity is to define the evaluation and improvement category for autonomous AI agents. While observability for large language models is a maturing market, the specific challenge of debugging and optimizing multi-step, stateful agents that operate over time is a nascent and more complex problem. Lemma’s positioning as a platform for "self-improving" systems suggests an ambition to move beyond passive monitoring to active optimization, a critical capability if agents are to be deployed reliably in production. This outcome is reachable, rather than purely aspirational, because the founding team's technical focus is evidenced by public SDK development across multiple languages, a signal of intent to build for developers first [GitHub, 2026]. The backing from Y Combinator, a program with a strong track record in developer tools and infrastructure, provides a credible launchpad for this category-defining attempt [Y Combinator, 2026].

Multiple paths exist for Lemma to achieve scale, each dependent on specific catalysts and early execution.

Scenario	What happens	Catalyst	Why it's plausible
Developer-led adoption	Lemma’s SDKs become the default choice for teams building with LangChain, LlamaIndex, or other agent frameworks, creating a bottom-up adoption motion.	A major open-source agent framework officially integrates or recommends Lemma for tracing and evaluation.	The company has already released public SDKs for TypeScript, Python, and Go, demonstrating a developer-first approach [GitHub, 2026]. This mirrors the early growth playbook of infrastructure tools like Sentry or Datadog.
Enterprise platform sale	Lemma lands a flagship enterprise deal with a company running high-stakes AI agents (e.g., in customer support, financial analysis, or autonomous research), validating the platform for complex, mission-critical use cases.	A public case study or technical deep-dive is published with a named enterprise customer, showcasing measurable improvements in agent reliability or cost.	The product claims are explicitly targeted at "enterprises and developers building adaptive AI agents for dynamic environments" [Startup Intros, 2025], indicating an intent to solve high-value problems where performance drift has tangible cost.

Compounding for Lemma would likely manifest as a data and workflow flywheel. Early adopters would generate traces and failure patterns, enriching Lemma’s dataset for identifying common failure modes and optimization opportunities. This proprietary dataset could then be used to improve its automated prompt optimization and failure detection algorithms, making the platform more valuable for subsequent users. Over time, as teams standardize their agent evaluation workflows on Lemma, switching costs would increase, creating a form of workflow lock-in. The initial evidence of this flywheel is not yet public, but the architectural intent is clear from the product description, which cites "continuous learning from user feedback and production outcomes" [Startup Intros, 2025].

The size of the win can be framed by looking at comparable infrastructure platforms. LangSmith, a direct competitor in the AI development toolchain, reached a $200 million valuation within two years of launch following rapid developer adoption [TechCrunch, 2024]. In the broader observability space, public companies like Datadog trade at significant revenue multiples, reflecting the market's valuation of mission-critical software infrastructure. If Lemma successfully executes on the developer-led adoption scenario and captures a meaningful portion of the emerging AI agent tooling market, an outcome in the hundreds of millions of dollars in valuation is plausible (scenario, not a forecast). This potential is what makes the current pre-seed, pre-traction stage a high-risk, high-reward proposition for investors comfortable with category creation bets.

Data Accuracy: YELLOW -- Product claims and team background are sourced from a single startup database; technical activity (GitHub) and accelerator participation are independently verified.

Sources

PUBLIC

[Startup Intros, 2025] Lemma: Funding, Team & Investors | https://startupintros.com/orgs/uselemma
[Y Combinator & LinkedIn, 2026] Lemma: Reliability platform for AI agents | https://www.ycombinator.com/companies/uselemma
[LinkedIn, 2026] Jerry Zhang - Lemma (YC F25) | LinkedIn | https://www.linkedin.com/in/jerry-n-zhang/
[GitHub, 2026] Lemma Labs · GitHub | https://github.com/uselemma
[Y Combinator & PitchBook, 2026] Lemma (San Francisco) 2025 Company Profile: Valuation, Funding & Investors | PitchBook | https://pitchbook.com/profiles/company/904260-16
[Perplexity Sonar Pro, 2025] Lemma (uselemma.ai / startupintros.com/orgs/uselemma) | https://startupintros.com/orgs/uselemma
[LinkedIn (Jerry Zhang), 2026] Lemma (YC F25) | https://www.linkedin.com/company/uselemma
[Grand View Research, 2024] MLOps Platform Market Size, Share & Trends Analysis Report 2024-2030 | https://www.grandviewresearch.com/industry-analysis/mlops-platform-market-report
[Crunchbase, 2025] LangChain Company Profile & Funding | https://www.crunchbase.com/organization/langchain
[Crunchbase, 2024] Arize AI Company Profile & Funding | https://www.crunchbase.com/organization/arize-ai
[Crunchbase, 2024] Langfuse Company Profile & Funding | https://www.crunchbase.com/organization/langfuse
[Crunchbase, 2024] Helicone Company Profile & Funding | https://www.crunchbase.com/organization/helicone
[TechCrunch, 2024] LangChain raises $25M to build out its platform for developing LLM-powered applications | https://techcrunch.com/2024/01/31/langchain-raises-25m-to-build-out-its-platform-for-developing-llm-powered-applications/

Articles about Lemma

Lemma's SDKs Anchor a Bid to Debug the Self-Improving AI Agent — The YC-backed startup is betting that observability for adaptive AI systems is a new category, not just another monitoring tool.

View on Startuply.vc

Lemma

Links

Executive Summary

Taxonomy Snapshot

Company Overview

Product and Technology

Market Research

Competitive Landscape

Opportunity

Sources

Articles about Lemma