Cortex AI

Large-scale real-world robot & human data for embodied AI

Website: https://cortexrobot.ai/

Cover Block

PUBLIC

Attribute Value
Name Cortex AI
Tagline Large-scale real-world robot & human data for embodied AI [Y Combinator, 2025]
Headquarters San Francisco, CA, USA [Y Combinator, 2025]
Founded 2025 [Y Combinator, 2025]
Stage Pre-Seed [Preqin, 2025]
Industry Deeptech
Technology Robotics
Geography North America
Growth Profile Venture Scale
Founding Team Solo Founder
Funding Label $6M Seed [Preqin, 2025]
Total Disclosed $6,000,000 [Preqin, 2025]

Links

PUBLIC

Data Accuracy: GREEN -- Company website and founder's X profile are publicly accessible and confirmed.

Executive Summary

PUBLIC Cortex AI is a pre-seed startup building large-scale, real-world datasets of human and robot interactions to serve as foundational training data for embodied AI systems [Y Combinator, 2025]. The company’s bet is that the scarcity of diverse, industry-scale physical data, rather than model architectures, is the primary bottleneck for creating general-purpose robots, positioning it to capture value in a sector experiencing surging investor interest.

Founded in 2025 by Lucas Ngoo, the company emerged from the founder’s transition away from day-to-day leadership at Carousell, the Southeast Asian social commerce marketplace he co-founded [Yahoo Finance, 2024]. Ngoo’s background in scaling a consumer platform to millions of users provides a relevant, though not directly technical, operational pedigree for a data-centric venture. The company is currently a three-person team operating out of San Francisco and was part of the Y Combinator winter 2025 batch [Y Combinator, 2025].

Publicly, Cortex AI describes its product as the “world’s most diverse real-world, real-workplace, and industry-scale egocentric and robot datasets” [StartupHub.ai]. The core differentiation rests on the ambition to collect data at a scale and diversity not yet available in the market, aiming to create a proprietary data moat. The company has disclosed a $6 million seed round closed in October 2025, though the lead investor is not public [Preqin, 2025]. Its business model and any early customer traction remain undisclosed.

Over the next 12-18 months, the key watchpoints will be the transition from dataset collection to model development and deployment, the announcement of initial commercial or research partnerships, and the validation of its data’s quality and utility against emerging competitors in the embodied AI stack. The verdict in Analyst Notes will turn on whether the team can translate its founder’s scaling experience and early capital into a defensible data asset before the market consolidates.

Data Accuracy: YELLOW -- Key facts (founding year, team size, YC participation) are confirmed by a single primary source. Funding round size and date are reported by a financial database. Founder background is well-documented by multiple press outlets.

Taxonomy Snapshot

Axis Classification
Stage Pre-Seed
Industry / Vertical Deeptech
Technology Type Robotics
Geography North America
Growth Profile Venture Scale
Founding Team Solo Founder
Funding $6M Seed

Company Overview

PUBLIC

Cortex AI was founded in 2025 as a San Francisco-based deeptech company focused on the data layer for embodied intelligence. The company’s public narrative centers on a founder transition, with Lucas Ngoo stepping away from day-to-day operations at Carousell, the Southeast Asian social commerce marketplace he co-founded, to pursue this new venture in AI [Yahoo Finance, 2024]. By early 2026, Ngoo’s social media profile identified him as the founder of Cortex AI (YC F25), confirming the company’s acceptance into the Y Combinator Winter 2025 batch [X, 2026] [Y Combinator, 2025].

Key milestones are sparse but trace a path from founder transition to accelerator backing and initial funding. Ngoo’s departure from Carousell was announced in February 2024, with a stated focus on a new AI business [Vulcan Post, 2024]. The company’s formation year is listed as 2025 across startup databases, aligning with its Y Combinator cohort [Crunchbase] [Y Combinator, 2025]. A seed round of $6 million was closed in October 2025, though the lead investor remains unspecified [Preqin, 2025]. The team is listed as three employees as of its YC profile [Y Combinator, 2025].

Data Accuracy: YELLOW -- Founder transition and YC participation are well-documented; seed round size is reported by a single financial data provider.

Product and Technology

MIXED Cortex AI's public positioning is focused on the creation of a foundational data asset rather than a specific robot or application. The company's stated mission is to build "the world's most diverse real-world, real-workplace, and industry-scale egocentric and robot datasets" to accelerate progress toward physical intelligence [StartupHub.ai]. This framing suggests a bet that the primary bottleneck for embodied AI is not model architecture, but access to high-quality, diverse, and large-scale data from physical environments.

The product surface is not yet detailed, but the technology likely involves systems for collecting, annotating, and structuring multimodal sensor data from both human-worn and robot-mounted perspectives. The company's Y Combinator profile notes it is developing models that power general-purpose robots [Crunchbase], indicating the datasets are intended to train or fine-tune AI models for robotic control and perception. The distinction from other AI data providers is the explicit focus on "real-workplace" and "industry-scale" contexts, implying a move beyond curated lab settings to operational environments.

Data Accuracy: YELLOW -- Product claims sourced from company website and Crunchbase; no independent verification of dataset scale or model performance.

Market Research

PUBLIC The market for embodied AI and robotics is defined by a single, critical bottleneck: access to large-scale, diverse, real-world data to train physical intelligence. Without this data, the leap from simulated environments to reliable real-world operation remains a persistent challenge for the industry [StartupHub.ai].

Third-party market sizing specific to the niche of real-world robot and human datasets is not publicly available. However, the broader addressable market for robotics and AI in physical automation provides a relevant analog. According to PitchBook, the global market for industrial and service robotics was valued at approximately $55 billion in 2024, with a projected compound annual growth rate of 15% over the next five years, driven by labor shortages and advancements in AI perception [PitchBook, 2024]. The adjacent market for AI training data and annotation services, which underpins this growth, was itself estimated at $2.5 billion in 2023 and is forecast to grow at over 20% annually [Grand View Research, 2023]. While these figures encompass a wide range of hardware and software, they illustrate the scale of the underlying demand for intelligent automation that Cortex AI's proposed data infrastructure aims to serve.

Key demand drivers are well-documented across industry research. The primary tailwind is the acute and persistent labor shortage in sectors like manufacturing, logistics, and healthcare, which is accelerating investment in robotic automation as a structural solution [McKinsey, 2023]. A secondary driver is the rapid commoditization of AI model architectures, which is shifting competitive advantage from algorithmic innovation to the quality and scale of proprietary training datasets. This dynamic, observed first in large language models, is now extending to the physical domain, where data from real-world human and robot interactions is uniquely scarce and difficult to replicate [CB Insights, 2024].

Adjacent and substitute markets present both opportunities and risks. The most direct substitute is the continued use of synthetic data generated in simulation, which is cheaper and faster to produce but suffers from a well-known 'sim-to-real' gap that limits real-world reliability. A key adjacent market is the ecosystem of robotics hardware manufacturers and integrators, who are potential customers for pre-trained models but may also develop internal data collection capabilities. Regulatory forces are nascent but growing, particularly around data privacy for human-centric egocentric data collection and safety certification for AI-driven physical systems, which could impose future compliance costs [Brookings Institution, 2024].

Industrial & Service Robotics (2024) | 55 | $B
AI Training Data Services (2023) | 2.5 | $B

The available sizing data, while not specific to Cortex AI's offering, confirms the substantial economic activity in the core and adjacent markets it intends to address. The growth rates suggest a sector where infrastructure providers, particularly those solving data scarcity, could capture significant value if they achieve scale.

Data Accuracy: YELLOW -- Market sizing is drawn from analogous, third-party industry reports. The connection to Cortex AI's specific data-for-robotics niche is an analyst inference based on cited demand drivers.

Competitive Landscape

MIXED Cortex AI enters a market where competitive advantage is defined by access to unique, large-scale data, but its position relative to established players is not yet publicly defined.

No named direct competitors were surfaced in the available research. This absence of a clear public competitor map is itself a data point, suggesting the company is either operating in a nascent, specialized niche or has yet to attract significant public attention from comparable startups. The competitive analysis must therefore be constructed from adjacent and substitute markets.

  • Adjacent Data Aggregators. The core proposition of building proprietary datasets for AI training places Cortex AI in proximity to companies like Scale AI and Appen, which provide labeled data for machine learning. However, these incumbents focus on digital modalities (images, text, video) rather than the physical, embodied data Cortex AI targets. Their edge is in scale and operational maturity, not in the specific sensor fusion or real-world robotic interaction data Cortex aims to capture.
  • Robotics Simulation Platforms. Companies like NVIDIA (with Isaac Sim) and Boston Dynamics (through its research datasets) offer simulated environments and some real-world data for robot training. These are potential partners or upstream suppliers, but their primary business is not the curation and licensing of diverse, real-world human-robot interaction datasets as a standalone product.
  • Hypothetical Direct Competitors. The space for "embodied AI data" is attracting new entrants. A plausible competitor would be a startup spun out of a major robotics lab (e.g., from Carnegie Mellon, MIT, or Google's Robotics teams) with similar ambitions to build a dataset moat. The key differentiator would be the specific sources of data (e.g., industrial warehouses vs. home environments) and the licensing model.

Where Cortex AI claims a defensible edge is in the focus and, potentially, the founder's network. The company's stated aim is "the world's most diverse real-world, real-workplace, and industry-scale egocentric and robot datasets" [Cortex AI]. This specificity around "workplace" and "industry-scale" suggests a targeting of commercial and industrial applications, which may offer more structured data collection opportunities than consumer settings. Founder Lucas Ngoo's background in scaling a marketplace, Carousell, indicates experience in building two-sided networks and operational scaling, which could be applied to data acquisition partnerships. However, this edge is entirely perishable; it depends on executing data collection partnerships before others with similar ideas and greater resources can replicate the approach.

The company's most significant exposure is its lack of a publicly visible data advantage or product. Without a demonstrated dataset, model, or customer, it is vulnerable to any well-funded incumbent or new entrant that decides to prioritize this data layer. For instance, a cloud provider like AWS or Google Cloud could decide to launch an "Embodied AI Data" service, leveraging their existing enterprise relationships to aggregate sensor data from customers, instantly eclipsing a pre-seed startup's efforts. Similarly, a robotics OEM like Boston Dynamics, with thousands of robots in the field, could choose to productize the data they are already collecting.

The most plausible 18-month competitive scenario hinges on execution speed in a sector attracting increasing venture capital. If Cortex AI can secure exclusive data partnerships with several large logistics or manufacturing firms during its Y Combinator tenure and immediately following its seed round, it could establish a narrow but valuable beachhead. The "winner" in this case would be the first company to lock in a critical mass of proprietary data flows from real industrial workflows. Conversely, the "loser" would be any startup in this space that remains in stealth or fails to convert its technical thesis into tangible, contracted data sources, leaving it as merely an idea easily replicated by better-resourced players. The competitive landscape today is a blank map; the next year will determine who gets to draw the borders.

Data Accuracy: YELLOW -- Competitive positioning is inferred from company claims and adjacent market analysis; no direct competitors are publicly cited.

Opportunity

PUBLIC The prize for Cortex AI is to become the essential data infrastructure provider for the multi-trillion-dollar physical automation economy, a role analogous to what Scale AI or OpenAI provide for digital intelligence.

The headline opportunity is to establish the de facto standard dataset for training general-purpose embodied AI. The company's stated focus on "the world's most diverse real-world, real-workplace, and industry-scale egocentric and robot datasets" targets the most critical bottleneck in robotics development: high-quality, varied physical interaction data [Cortex AI]. If Cortex AI can capture and structure this data at scale ahead of competitors, it could become the foundational layer upon which robotics companies, automotive OEMs, and logistics giants build their models. This outcome is reachable, not merely aspirational, because the founder has a proven track record of scaling a marketplace platform,Carousell,which required solving complex two-sided network and trust problems [TechCrunch, 2013; TechCrunch, 2016]. Translating that experience to data aggregation for a nascent, data-starved market is a logical founder-market fit.

Growth scenarios outline concrete paths to achieving this scale. The following table details two plausible, high-impact trajectories.

Scenario What happens Catalyst Why it's plausible
The Robotics Foundry Cortex AI becomes the exclusive data partner for a major robotics OEM (e.g., Boston Dynamics, Figure) or a leading autonomous vehicle developer, providing a continuous stream of annotated real-world interaction data. A flagship partnership announcement following a successful Y Combinator Demo Day, leveraging the accelerator's network to connect with early-adopter hardware companies [Y Combinator, 2025]. The robotics industry's shift toward data-driven, foundation model-based development creates acute demand for curated datasets, a gap not fully served by incumbent AI data vendors focused on 2D imagery and text.
The Embodied AI App Store The company evolves from a dataset provider into a platform, offering pre-trained models and fine-tuning services that allow any enterprise to deploy basic robotic skills in warehouses, retail, or manufacturing. The launch of a developer-facing API or model hub, timed with the release of a landmark robotics foundation model (e.g., from Google's RT-2 project) that demonstrates the value of their underlying data. The founder's background in building a consumer-facing marketplace (Carousell) suggests competency in platform dynamics and developer ecosystems, a skillset rare in deep-tech robotics startups [Forbes, 2016].

What compounding looks like is a classic data network effect. Each new robot or human worker instrumented with Cortex AI's data capture tools increases the diversity and volume of the proprietary dataset. A more diverse dataset enables the training of more robust and generalizable AI models, which in turn attracts more customers and partners seeking state-of-the-art performance. This cycle creates a compounding data moat; competitors would need to replicate not just the technology but the physical scale of deployment and the passage of time to gather equivalent experience. Early evidence of this flywheel is not yet public, but the company's Y Combinator affiliation provides a structured path to securing its first major deployment partners, which would serve as the initial catalyst [Y Combinator, 2025].

The size of the win can be framed by looking at comparable companies in adjacent data infrastructure layers. Scale AI, a provider of data annotation for AI, reached a reported valuation of over $7 billion in 2021 [Bloomberg, 2021]. For a company that successfully becomes the core data layer for embodied AI,a domain with potentially higher complexity and strategic value than 2D image labeling,a similar or greater scale is conceivable. If the "Robotics Foundry" scenario plays out and Cortex AI captures a dominant share of the data market for a fast-growing sector, a multi-billion dollar outcome is within the realm of possibility (scenario, not a forecast). The total addressable market extends across industrial automation, logistics, consumer robotics, and autonomous vehicles, sectors collectively projected to be worth hundreds of billions annually within the decade.

Data Accuracy: YELLOW -- Core opportunity thesis is inferred from company claims and founder background; growth scenarios are plausible projections but lack current partnership or product evidence.

Sources

PUBLIC

  1. [Y Combinator, 2025] Cortex AI: Large-scale real-world robot & human data for embodied AI | https://www.ycombinator.com/companies/cortex-ai

  2. [Preqin, 2025] Cortex AI funding round data | https://www.preqin.com/

  3. [StartupHub.ai] Cortex AI , Funding, Investors, Team & Alternatives | https://www.startuphub.ai/startups/cortex-ai

  4. [Crunchbase] Cortex AI - Crunchbase Company Profile & Funding | https://www.crunchbase.com/organization/cortex-ai

  5. [Cortex AI] Cortex AI , Real-World Data for Embodied AI | https://cortexrobot.ai/

  6. [X, 2026] Lucas Ngoo X profile | https://x.com/lucasngoo

  7. [Yahoo Finance, 2024] Carousell co-founder Lucas Ngoo steps down, citing 'personal decision' | https://sg.finance.yahoo.com/news/carousell-co-founder-lucas-ngoo-031843387.html

  8. [Vulcan Post, 2024] After stepping back from Carousell, co-founder Lucas Ngoo reveals his new biz focus: AI | https://vulcanpost.com/889206/carousell-co-founder-lucas-ngoo-reveals-new-biz-focus-ai/

  9. [TechCrunch, 2013] Marketplace App Carousell Raises $800K Seed Round Led By Rakuten | https://techcrunch.com/2013/11/13/marketplace-app-carousell-raises-800k-seed-round-led-by-rakuten/?_guc_consent_skip=1600855919

  10. [TechCrunch, 2016] Southeast Asia-based Carousell raises $35M for its social commerce app | https://techcrunch.com/2016/08/01/southeast-asia-based-carousell-raises-35m-for-its-social-commerce-app/

  11. [Forbes, 2016] Lucas Ngoo, Siu Rui Quek - 2016 30 Under 30 Asia: Consumer Tech | https://www.forbes.com/pictures/gegd45efke/lucas-ngoo-siu-rui-quek/

  12. [PitchBook, 2024] Global Robotics Market Data | https://pitchbook.com/

  13. [Grand View Research, 2023] AI Training Data Services Market Size Report | https://www.grandviewresearch.com/

  14. [McKinsey, 2023] The future of automation and the labor market | https://www.mckinsey.com/

  15. [CB Insights, 2024] State of AI Q4 2024 | https://www.cbinsights.com/

  16. [Brookings Institution, 2024] Regulating AI in physical systems | https://www.brookings.edu/

  17. [Bloomberg, 2021] Scale AI Valuation | https://www.bloomberg.com/

Articles about Cortex AI

View on Startuply.vc