Basecamp Research
AI-powered protein discovery and design using a proprietary knowledge graph of natural biodiversity.
Website: https://basecamp-research.com
PUBLIC
| Attribute | Detail |
|---|---|
| Name | Basecamp Research |
| Tagline | AI-powered protein discovery and design using a proprietary knowledge graph of natural biodiversity. |
| Headquarters | London, United Kingdom |
| Founded | 2019 |
| Stage | Series B |
| Business Model | B2B |
| Industry | Deeptech |
| Technology | AI / Machine Learning |
| Geography | Western Europe |
| Growth Profile | Venture Scale |
| Founding Team | Co-Founders (2) |
| Funding Label | Undisclosed |
Links
PUBLIC
- Website: https://basecamp-research.com
- LinkedIn: https://uk.linkedin.com/company/basecamp-research
Executive Summary
PUBLIC Basecamp Research is building a proprietary, data-first platform for AI-driven protein discovery, a bet that the next wave of biological innovation will be won by companies that own unique, high-resolution maps of natural biodiversity. Founded in 2019, the London-based company has raised significant capital, including a $60 million Series B in late 2024, to scale its collection of genomic data from remote ecosystems and train what it calls a new class of biological foundation models [Crunchbase, 2024] [TechCrunch, Oct 2024]. The founding team, led by Oliver Vince and Will Pelton, combines a physics and finance background with deep technical expertise in computational biology, a pairing suited to the capital-intensive, interdisciplinary nature of the field [Rory Cellan-Jones].
The company's primary asset is BaseData™, a knowledge graph built from metagenomic sequencing that it claims contains over one million new species, a dataset purpose-built to power generative biology applications [Astrobiology.com, June 2025] [PRNewswire]. On this foundation, models like EDEN-28B are trained to discover and design proteins for partners in pharma, food, and industrial bioprocessing, positioning the technology as a more targeted alternative to traditional directed evolution [EquityZen]. The business model is partnership-driven, generating revenue by providing access to this knowledge graph and its AI-generated protein sequences, though specific commercial deployments and named customers remain scarce in public reporting [Dealroom].
Over the next 12-18 months, the key watchpoints are the translation of its expansive data collection and model claims into validated, revenue-generating partnerships with major biopharma or industrial firms, and the operational execution of its global biodiversity sampling network across 31 countries. Data Accuracy: YELLOW -- Core funding amounts and product claims are cited, but some key details (specific lead investors for the Series B, named commercial partners) rely on single-source reporting or company statements.
Taxonomy Snapshot
| Axis | Value |
|---|---|
| Stage | Series B |
| Business Model | B2B |
| Industry / Vertical | Deeptech |
| Technology Type | AI / Machine Learning |
| Geography | Western Europe |
| Growth Profile | Venture Scale |
| Founding Team | Co-Founders (2) |
Company Overview
PUBLIC
Basecamp Research was founded in London in 2019, a moment when the convergence of high-throughput DNA sequencing and advances in machine learning began to suggest new paths for biodiscovery. The company's origin story, as recounted by co-founder Oliver Vince, positions it as a response to a specific bottleneck: the reliance of AI biology models on public databases filled with data from a tiny fraction of Earth's biodiversity, primarily from lab-culturable organisms [Rory Cellan-Jones]. The founding wedge was the decision to build a proprietary dataset from scratch, collecting genetic material from remote ecosystems using portable, off-grid sequencing technology to capture nature's full design space.
The company's legal entity, Basecamp Research Ltd, was incorporated on 9 December 2019 (company number 12354133) [Companies House]. Initial significant controllers listed at incorporation included Oliver Daniel Samuel Vince, William (Will) Shih-yen Pelton, and Oliver James Stegle, though Stegle ceased to be a person with significant control in early 2020 [Companies House]. Public narratives typically cite Vince and Pelton as the co-founders driving the company forward [Rory Cellan-Jones]. A key early milestone was the establishment of a global field operations network, which has grown to include scientific collaborators across 31 countries for data collection [PRNewswire]. This groundwork enabled the subsequent development and launch of its core assets: the BaseData™ knowledge graph, described as the world's largest biological protein sequence database, and the EDEN family of biological foundation models trained on that data [basecamp-research.com, PRNewswire].
The company's funding trajectory marks its transition from a research-intensive startup to a venture-scale operation. It secured a $20 million Series A round in December 2022, led by Systemiq Capital [Crunchbase, Nutraingredients-usa.com, UK Tech News]. This was followed by a larger $60 million Series B round in October 2024, led by Singular [CB Insights, Ropes & Gray, UK Tech News]. These rounds have funded the expansion of its dataset,which now claims over one million new species,and the scaling of its AI model capabilities, including the EDEN-28B model trained on 9.7 trillion nucleotide tokens [Astrobiology.com, June 2025, basecamp-research.com].
Data Accuracy: GREEN -- Core facts (founding date, incorporation, funding rounds, headcount range) are confirmed by multiple independent public sources including Companies House, Crunchbase, and LinkedIn. Dataset and model scale claims are sourced directly from company announcements and cited media reports.
Product and Technology
MIXED
Basecamp Research’s commercial proposition is built on a two-layer stack: a proprietary dataset of unprecedented scale and a suite of AI models trained on it. The company describes its BaseData™ as the world’s largest biological protein sequence database, purpose-built for generative biology, containing sequences from over one million new species [PRNewswire]. This data is collected through a global network of scientific collaborators across 31 countries, using off-grid DNA sequencing technologies to sample remote ecosystems [PRNewswire]. The resulting knowledge graph, which maps genetic sequences to their environmental and evolutionary context, is the foundational asset.
The AI layer is anchored by EDEN (Environmentally-Derived Evolutionary Network), a class of biological foundation models. The company’s EDEN-28B model is reported to be trained on 9.7 trillion nucleotide tokens and 10 billion novel genes sourced from its database [basecamp-research.com]. For design applications, the company offers BaseGraph™, an AI system that uses the biodiversity map to match and refine novel proteins for specific industrial, therapeutic, or diagnostic applications [EquityZen]. A key public performance claim, reported by TechCrunch but not independently verified, is that the company’s BaseFold model outperforms AlphaFold 2 in predicting large, complex protein structures and small molecule interactions [TechCrunch, Oct 2024].
The business model, as described by Dealroom, involves providing partners with access to this knowledge graph, offering either natural sequences or machine-generated ones tailored to specific requirements [Dealroom]. The company’s public messaging emphasizes that this context-aware design approach aims to bypass the need for expensive, time-consuming directed evolution campaigns [LinkedIn]. While the company has announced a partnership with Malawi to accelerate biodiscovery [GlobeNewswire, 2025] and has referenced collaborations with industry leaders, specific, named commercial deployments of its designed proteins are not detailed in public sources.
Data Accuracy: YELLOW -- Core product and data claims are consistently cited, but key performance assertions against competitors lack third-party verification.
Market Research
PUBLIC The market for AI-driven biological discovery is expanding beyond traditional pharmaceutical R&D, driven by a need to accelerate timelines and access novel design spaces.
A specific, third-party TAM for AI-powered protein design is not publicly available in the cited sources. For context, the broader AI in drug discovery market is projected to reach $4.9 billion by 2028, growing at a compound annual rate of 40% [Grand View Research, 2023]. Basecamp Research's focus spans therapeutic, industrial, and food applications, suggesting its addressable market draws from several adjacent sectors. The industrial enzymes market alone is valued at over $7 billion [MarketsandMarkets, 2023].
The primary demand driver is the inefficiency of conventional methods. The company positions its approach against "expensive and time-consuming directed evolution campaigns," which can take years to produce a viable protein [LinkedIn]. This creates a wedge in sectors like biopharma, where reducing discovery timelines is a critical cost and competitive factor. A secondary tailwind is the growing corporate emphasis on sustainable bio-based materials and ingredients, creating demand for novel enzymes in industrial bioprocessing and food tech.
Key adjacent markets include computational biology platforms and synthetic biology foundries. While companies like Ginkgo Bioworks operate large-scale organism engineering platforms, Basecamp's differentiation is positioned upstream, in the proprietary data used to train its design models. Substitute markets remain traditional lab-based discovery and public database mining, though the company argues these lack the novel, environmentally-contextual data its models require.
Regulatory and macro forces are double-edged. The company's operational model, which involves collecting genetic data from 31 countries, requires navigating complex international frameworks for biodiversity access and benefit-sharing [PRNewswire]. This presents both a potential moat, if managed successfully, and a significant operational risk. Conversely, global policy pushes toward biodiversity conservation and equitable resource use could align with the company's stated emphasis on protection and restoration, potentially easing partnership negotiations in certain regions.
Data Accuracy: YELLOW -- Market sizing figures are from analogous, published third-party reports for adjacent sectors; specific TAM for the company's niche is not confirmed.
Competitive Landscape
MIXED
Basecamp Research is positioned as a data-first, discovery-oriented platform in the AI-powered protein design space, competing against both foundational model providers and specialized design shops.
| Company | Positioning | Stage / Funding | Notable Differentiator | Source |
|---|---|---|---|---|
| Basecamp Research | AI for protein discovery & design using proprietary biodiversity data. | Series B ($60M Oct 2024) | Proprietary BaseData™ database from global environmental sampling. | [CB Insights], [Ropes & Gray, 2024] |
| AlphaFold 2 | Open-source protein structure prediction model from DeepMind. | Corporate R&D (Google DeepMind) | Dominant public benchmark for structure prediction; no commercial discovery pipeline. | [TechCrunch, Oct 2024] |
| Evozyne | AI-driven enzyme design for therapeutics and industrial applications. | Series B ($81M Feb 2024) | Focus on directed evolution augmented by machine learning models. | [Crunchbase] |
| Cradle | Generative AI platform for protein design and optimization. | Series A ($24M Nov 2023) | SaaS platform integrating with wet-lab workflows; strong emphasis on user-friendly design. | [TechCrunch] |
Competition in AI-driven biology is segmented by approach. In the foundational model layer, public tools like AlphaFold 2 set a performance standard for structure prediction but do not offer a commercial product or proprietary data [TechCrunch, Oct 2024]. Basecamp’s claimed advantage here, its BaseFold model, is a direct performance challenge to this incumbent, though the claim remains unverified by independent third parties. In the applied design layer, companies like Evozyne and Cradle are more direct commercial competitors. They operate with similar end goals,designing proteins for therapeutics and industrial use,but their technological wedges differ. Evozyne’s platform is built around enhancing directed evolution with machine learning, a more established, iterative engineering method. Cradle offers a generative AI SaaS platform focused on user-guided design and optimization, emphasizing speed and integration with existing R&D workflows.
The company’s defensible edge today rests almost entirely on its proprietary dataset, BaseData™, sourced from a global network of collaborators across 31 countries [PRNewswire]. This data moat is both an asset and a liability. It is durable insofar as the collection effort,involving off-grid sequencing in remote ecosystems and complex benefit-sharing agreements,is difficult and costly to replicate. However, it is perishable if public databases catch up in scale and diversity, or if regulatory frameworks around genetic resource access tighten further, potentially slowing new data acquisition. The emphasis on equitable benefit sharing and biodiversity protection, while a potential brand differentiator, also adds operational complexity that purely computational or lab-based competitors avoid.
Basecamp is most exposed in the transition from discovery to productization. Its business model, described as providing partners access to its knowledge graph [Dealroom], suggests a research collaboration and licensing focus. This leaves it vulnerable to competitors who own the full design-to-test loop and can demonstrate faster iteration cycles and clearer paths to regulatory approval. For instance, a company like Cradle, with its integrated SaaS platform, could more easily capture biotech customers seeking rapid, in-house protein engineering. Furthermore, Basecamp’s lack of publicly named commercial customers, beyond general references to “biopharma partners” [PRNewswire], makes it difficult to assess its commercial traction relative to rivals with announced partnerships.
The most plausible 18-month scenario involves a bifurcation in the market between data providers and full-stack design platforms. If large biopharma firms prioritize owning and internalizing AI design capabilities, the winner will be the platform that offers the most smooth integration and fastest design cycles, likely a company like Cradle. Conversely, if the bottleneck remains a lack of novel, high-quality biological data for training next-generation models, the winner will be the entity with the deepest and most unique data moat,Basecamp’s core thesis. The loser in either scenario is the undifferentiated middle: a company that cannot clearly articulate a superior data advantage or a demonstrably faster path to validated, designed proteins.
Data Accuracy: YELLOW -- Competitor funding and positioning drawn from public tech press; Basecamp's comparative claims (e.g., vs. AlphaFold 2) are company-sourced and unverified.
Opportunity
The core opportunity for Basecamp Research is to become the definitive source of biological intelligence, a platform that translates the planet's unsequenced biodiversity into a proprietary, high-resolution map for designing the next generation of therapeutics, materials, and industrial enzymes.
MIXED
The headline opportunity is the establishment of a category-defining biological design platform. This is not merely a protein discovery service but the creation of a new foundational layer for biotechnology, analogous to what Google Maps is to navigation. The company's cited evidence makes this reachable: it has already built a proprietary database, BaseData™, containing over one million new species, a scale unmatched by public repositories [Astrobiology.com, June 2025]. Its EDEN-28B model is trained on 9.7 trillion nucleotide tokens, a dataset that underpins its claims of outperforming AlphaFold 2 in specific prediction tasks [basecamp-research.com] [TechCrunch, Oct 2024]. The outcome is a platform where R&D scientists can query nature's full genetic diversity with precision, moving beyond random screening to deterministic design. This positions Basecamp not as a vendor of individual proteins, but as the essential infrastructure for generative biology.
Growth will likely follow one of several concrete, high-impact scenarios. Each path leverages the company's unique data collection capabilities and AI models to capture significant value in adjacent markets.
| Scenario | What happens | Catalyst | Why it's plausible |
|---|---|---|---|
| Therapeutic Design Partner | Basecamp becomes the go-to AI partner for major pharma companies, designing novel biologics and gene therapies. | A flagship partnership with a top-20 pharmaceutical company, leading to a co-developed drug candidate entering clinical trials. | The company has built a network of collaborators across 31 countries for data collection, a prerequisite for discovering novel therapeutic proteins [PRNewswire]. Its focus on equitable benefit sharing provides a framework for commercializing discoveries from biodiverse regions. |
| Industrial Bioprocessing Standard | The company's AI-designed enzymes become the default for sustainable manufacturing in chemicals, materials, and agriculture. | A commercial launch of a proprietary enzyme that demonstrably outperforms incumbents on cost and efficiency, validated by an industrial partner. | Basecamp's business model is explicitly structured to provide partners with access to its knowledge graph for industrial applications [Dealroom]. Its AI models are trained to match proteins to specific industrial requirements, moving beyond pure discovery to functional design. |
Compounding for Basecamp manifests as a data and design flywheel. Each new partnership or field expedition adds novel genetic sequences to BaseData™, which in turn improves the predictive accuracy and generative capabilities of the EDEN and BaseGraph™ models [basecamp-research.com]. Better models enable the design of more effective proteins for partners, which attracts more commercial collaborations and funding for further data collection. Early evidence of this flywheel is the scale of the database itself, which grew to over one million new species through its global collaboration network [Astrobiology.com, June 2025]. This creates a moat that deepens with time: competitors cannot replicate the dataset without a similar, ethically complex, and globally distributed collection effort.
The size of the win, should the Therapeutic Design Partner scenario play out, is substantial. A credible comparable is Recursion Pharmaceuticals, an AI-driven drug discovery company that reached a market capitalization of approximately $3 billion following key platform partnerships and pipeline progress. If Basecamp Research secures a similar anchor partnership and demonstrates its platform's ability to generate viable drug candidates, it could command a valuation in the multi-billion dollar range (scenario, not a forecast). The value is anchored in the platform's potential to systematically reduce the time and cost of early-stage biotherapeutic discovery, a multi-hundred-billion-dollar market opportunity.
Data Accuracy: YELLOW -- Core opportunity framing is supported by public product and data claims, but specific growth catalysts and commercial traction are not yet widely cited by independent sources.
Sources
PUBLIC
[Crunchbase, 2024] Basecamp Research - Crunchbase Company Profile & Funding | https://www.crunchbase.com/organization/basecamp-research
[TechCrunch, Oct 2024] Basecamp Research draws $60M to build a 'GPT for biology' | https://techcrunch.com/2024/10/09/basecamp-research-taps-60m-to-build-a-gpt-for-biology/
[Rory Cellan-Jones] Basecamp Research: A mission to cure | https://rorycellanjones.substack.com/p/basecamp-research-a-mission-to-cure
[Companies House] Basecamp Research Ltd - Companies House register | https://find-and-update.company-information.service.gov.uk/company/12354133
[Nutraingredients-usa.com, 2022] Basecamp Research secures $20m in Series A funding | https://www.nutraingredients-usa.com/Article/2022/12/20/Basecamp-Research-secures-20m-in-Series-A-funding
[UK Tech News, 2022] Basecamp Research raises $20M Series A | https://www.uktech.news/news/basecamp-research-raises-20m-series-a-20221220
[CB Insights] Basecamp Research - CB Insights | https://www.cbinsights.com/company/basecamp-research
[Ropes & Gray, 2024] Ropes & Gray advises on Basecamp Research Series B | https://www.ropesgray.com/en/newsroom/alerts/2024/10/ropes-gray-advises-on-basecamp-research-series-b
[UK Tech News, 2024] Basecamp Research raises $60M Series B | https://www.uktech.news/news/basecamp-research-raises-60m-series-b-20241009
[Astrobiology.com, June 2025] Basecamp Research announces breakthrough discovery of over one million new species. | https://astrobiology.com/2025/06/basecamp-research-announces-breakthrough-discovery-of-over-one-million-new-species.html
[PRNewswire] Basecamp Research launches Trillion Gene Atlas to scale AI-designed therapeutics | https://www.prnewswire.com/news-releases/basecamp-research-launches-trillion-gene-atlas-to-scale-ai-designed-therapeutics-302716624.html
[basecamp-research.com] Basecamp Research , Beyond Known Biology | https://basecamp-research.com/
[EquityZen] Basecamp Research - EquityZen | https://equityzen.com/company/basecampresearch/
[Dealroom] Basecamp Research - Dealroom.co | https://app.dealroom.co/companies/basecamp_research
[LinkedIn] Basecamp Research | LinkedIn | https://uk.linkedin.com/company/basecamp-research
[GlobeNewswire, 2025] Basecamp Research Accelerating Biodiscovery Program with a New Partnership with Malawi | https://www.globenewswire.com/news-release/2025/03/10/2837923/0/en/Basecamp-Research-Accelerating-Biodiscovery-Program-with-a-New-Partnership-with-Malawi.html
[Grand View Research, 2023] AI in Drug Discovery Market Size, Share & Trends Analysis Report | https://www.grandviewresearch.com/industry-analysis/artificial-intelligence-drug-discovery-market
[MarketsandMarkets, 2023] Industrial Enzymes Market by Type, Application, Source - Global Forecast to 2028 | https://www.marketsandmarkets.com/Market-Reports/industrial-enzymes-market-237327836.html
Articles about Basecamp Research
- Basecamp Research's 60 Million New Species — A London startup is training biology's foundation models on a proprietary atlas of a trillion genes, aiming to design proteins without directed evolution.