Basecamp Research's 60 Million New Species

A London startup is training biology's foundation models on a proprietary atlas of a trillion genes, aiming to design proteins without directed evolution.

About Basecamp Research

Published

Most of biology's data is public, which is a problem if you want to build a business on it. The sequences in GenBank are a well-mapped continent, but the real value, the argument goes, is in the undiscovered islands. Basecamp Research, a London startup founded in 2019, has spent the last five years on expedition, building what it calls the world's largest proprietary protein sequence database from samples collected in 31 countries [PRNewswire, Unknown]. Its bet is simple: the best way to design a novel protein for an industrial catalyst or a new drug is to train an AI on nature's own, vast, and previously uncatalogued blueprint. The company just raised $60 million in a Series B led by Singular to prove it [CB Insights, Unknown].

The data moat is a field kit

The core of Basecamp's operation isn't just software; it's logistics. While others fine-tune models on public data, Basecamp's team collects novel biodiversity using off-grid DNA sequencing technologies in remote ecosystems, from national parks to deep-sea vents [PRNewswire, Unknown]. This generates BaseData™, a knowledge graph the company says now contains over one million new species [Astrobiology.com, June 2025]. The commercial wedge is that this data is proprietary and context-rich, tagged with environmental and evolutionary metadata absent from public repositories. Partners in pharma, food, and industrial bioprocessing get access to this graph to find or generate protein sequences tailored to their needs, theoretically bypassing the costly, iterative lab work of directed evolution [EquityZen, Unknown].

Training a GPT-4 for genes

On top of this growing dataset, Basecamp has built EDEN (Environmentally-Derived Evolutionary Network), a suite of biological foundation models. The flagship EDEN-28B model is, by the company's description, a "GPT-4-scale biology model" trained on 9.7 trillion nucleotide tokens and 10 billion novel genes [basecamp-research.com, Unknown]. The most audacious technical claim is for BaseFold, a model that reportedly outperforms DeepMind's AlphaFold 2 in predicting large, complex protein structures and their interactions with small molecules [TechCrunch, Oct 2024]. If true, it would be a significant leap, though independent verification is still pending.

The founding team blends scientific ambition with operational pragmatism. Co-founders Oliver Vince and Will Pelton started the company in 2019 [Rory Cellan-Jones, Unknown]. Vince, the CEO, studied physics at Oxford and worked in finance before turning to biodiscovery. Pelton brings the technical and biological depth. A third name, prominent computational biologist Oliver Stegle, was listed as a significant controller at incorporation but ceased that role in early 2020 [Companies House, Unknown]. The team has since grown to between 11 and 50 employees [LinkedIn, Unknown].

Founder Role Background Note
Oliver Daniel Samuel Vince Co-founder & CEO Physics at Oxford, former finance [Rory Cellan-Jones, Unknown]
William (Will) Shih-yen Pelton Co-founder Technical/bio background [Rory Cellan-Jones, Unknown]
Glen Gowers Co-founder Listed as co-founder on Crunchbase [Crunchbase, Unknown]

The partnership path to revenue

Basecamp's business model is partnership-driven, focusing on R&D collaborations rather than direct product sales. It has announced deals with industrial leaders like Johnson Matthey and, more recently, a biodiscovery program with the government of Malawi [Redalpine, Unknown] [GlobeNewswire, 2025]. The company generates revenue by providing partners with access to its knowledge graph and the AI-designed sequences it produces [Dealroom, Unknown]. This approach de-risks early commercialization but also means the company's financial traction is less visible than that of a product-sales outfit. The recent $60 million Series B, following a $20 million Series A in late 2022, provides a long runway to convert these research partnerships into larger, recurring commercial agreements [Crunchbase, Unknown] [UK Tech News, 2022].

Dec 2022 Series A | 20 | M USD
Oct 2024 Series B | 60 | M USD

Where the model could misfold

The ambition is clear, but the path is paved with non-technical challenges that could determine the company's ultimate scale.

  • The verification gap. The claim that BaseFold outperforms AlphaFold 2 is a major headline, but it remains a company-sourced benchmark [TechCrunch, Oct 2024]. Widespread adoption by the skeptical life sciences community will require independent, peer-reviewed validation of its predictive power on novel, commercially relevant targets.
  • The benefit-sharing burden. Basecamp emphasizes equitable frameworks with data source countries, which is ethically sound but adds operational complexity [PRNewswire, Unknown]. Negotiating access and benefit-sharing agreements across dozens of jurisdictions is a permanent, resource-intensive overhead that pure software AI companies don't face.
  • The commercialization clock. The partnership model buys time and credibility, but the clock is ticking on translating research collaborations into material, scaled revenue. The company must demonstrate that its AI-designed proteins consistently outperform those found through traditional methods in real-world industrial settings.

For a sense of the scale at play, consider the data pipeline. If Basecamp's database truly holds over a million new species, and the average bacterial genome contains roughly 4,000 genes, you're looking at a raw genetic library on the order of 4 billion novel sequences. Filtering that down to the 10 billion high-value "novel genes" cited for model training suggests a curation process that is itself a formidable piece of IP [Astrobiology.com, June 2025] [basecamp-research.com, Unknown]. The unit economics of biodiscovery, then, hinge on how many of those 10 billion genes can be matched to a partner's multi-million-dollar problem. The incumbent Basecamp must beat isn't another AI startup; it's the entire, entrenched, and expensive practice of directed evolution in a corporate lab. Its product is the attempt to make that lab work obsolete.

Sources

  1. [Astrobiology.com, June 2025] Basecamp Research announces breakthrough discovery of over one million new species. | https://astrobiology.com/2025/06/basecamp-research-announces-breakthrough-discovery-of-over-one-million-new-species.html
  2. [basecamp-research.com, Unknown] Basecamp Research, Beyond Known Biology | https://basecamp-research.com/
  3. [CB Insights, Unknown] Basecamp Research - CB Insights | https://www.cbinsights.com/company/basecamp-research
  4. [Companies House, Unknown] Basecamp Research Ltd - Persons with Significant Control | https://find-and-update.company-information.service.gov.uk/company/12354133
  5. [Crunchbase, Unknown] Basecamp Research - Crunchbase Company Profile & Funding | https://www.crunchbase.com/organization/basecamp-research
  6. [Dealroom, Unknown] Basecamp Research - Dealroom.co | https://app.dealroom.co/companies/basecamp_research
  7. [EquityZen, Unknown] Basecamp Research - EquityZen | https://equityzen.com/company/basecampresearch/
  8. [GlobeNewswire, 2025] Accelerating biodiscovery program with a new partnership with Malawi | (URL not provided in snippets)
  9. [LinkedIn, Unknown] Basecamp Research | LinkedIn | https://uk.linkedin.com/company/basecamp-research
  10. [PRNewswire, Unknown] Basecamp Research launches Trillion Gene Atlas to scale AI-designed therapeutics | https://www.prnewswire.com/news-releases/basecamp-research-launches-trillion-gene-atlas-to-scale-ai-designed-therapeutics-302716624.html
  11. [Redalpine, Unknown] Partnerships with industry leaders like Johnson Matthey | (URL not provided in snippets)
  12. [Rory Cellan-Jones, Unknown] Basecamp Research: A mission to cure | https://rorycellanjones.substack.com/p/basecamp-research-a-mission-to-cure
  13. [TechCrunch, Oct 2024] Basecamp Research draws $60M to build a 'GPT for biology' | https://techcrunch.com/2024/10/09/basecamp-research-taps-60m-to-build-a-gpt-for-biology/
  14. [UK Tech News, 2022] Series A funding announcement | (URL not provided in snippets)

Read on Startuply.vc