Nous Research Wants Every Idle GPU on the Internet Training the Next Open Model

With $70M from Paradigm and OSS Capital, the New York lab is betting decentralized training can keep open-source AI competitive.

About Nous Research

Published

The first time you watch a Nous Research demo, the detail that lingers is not the model output. It is the topology. A 15-billion-parameter language model, training across machines scattered around the public internet, completing 11,000 steps without falling apart [aibase.com]. For anyone who has spent time inside an AI lab, where training runs are normally cloistered inside a single hyperscaler's fiber, the picture is a small act of heresy.

That heresy is the company. Nous Research, founded in New York in 2023 by Jeffrey Quesnelle and Karan Malhotra, is trying to prove that frontier-grade language models can be trained without a frontier-grade data center [Startup Intros]. In April 2025, Paradigm led a $50 million Series A at a reported $1 billion token valuation, the kind of number that telegraphs how seriously crypto-native investors are taking the idea that AI's compute bottleneck is also a coordination problem [Fortune, April 2025]. Total disclosed funding now sits near $70 million, including a $20 million seed led by OSS Capital in June 2024 and an earlier $5.2 million round led by Distributed Global in January 2024 [Crunchbase]; [CypherHunter, Jan 2024].

The bet

Nous sells, in the conventional sense, very little. It ships open-source models, open training infrastructure, and an emerging agent product, and treats the community of developers and researchers around them as the distribution channel. The company describes itself as a leader in the American open-source AI movement, training open-source language models with a human-centric design philosophy [Nous Research]. The technical wedge is a project called DisTrO, short for Distributed Training Over the internet, which compresses the bandwidth required to coordinate gradient updates across geographically separated machines [Andreessen Horowitz]. On top of DisTrO sits Psyche, an open infrastructure layer that lets underutilized hardware contribute to a single training run, with code released on GitHub as of January 2025 [Nous Research]; [X/Twitter, Jan 2025].

If the names sound like a graduate seminar, the practical claim is simpler: a model the size of Meta's mid-tier Llama variants can, in principle, be trained by a swarm rather than a supercomputer. The December 2024 Psyche test run, a 15B-parameter model spanning 11,000 steps verified across the global network, is the company's existence proof [aibase.com].

Why it could be big

The tailwinds are unusually legible. Nvidia H100s remain rationed, hyperscaler capex is concentrated among five buyers, and the open-source model tier, from Llama to Mistral to DeepSeek, has narrowed the quality gap with closed frontier labs faster than most forecasters expected a year ago. Paradigm's thesis, articulated in its Fortune interview around the round, is that decentralized training is the natural complement to that shift: if the weights are open, the compute that produced them should be open too [Fortune, April 2025]. OSS Capital, North Island Ventures, Distributed Global, and Delphi Digital round out a cap table that is unmistakably crypto-literate, which matters because Psyche's incentive design borrows from that world's playbook for coordinating strangers around shared infrastructure.

The upside, if execution holds, is a credible third pole in AI infrastructure. Not OpenAI's API, not Meta's release cadence, but a public-internet training fabric that any sufficiently motivated group, a university consortium, a sovereign, a developer collective, can use to produce a model whose provenance they actually control.

The team and traction

Quesnelle is co-founder and CTO, with a research track record that includes the YaRN paper on efficient context-window extension for large language models, co-authored with Bowen Peng [jeffq.com]; [LinkedIn]. Peng is Nous's Chief Scientist and a core contributor to DisTrO [YouTube]; [Andreessen Horowitz]. Malhotra serves as Director [LinkedIn]. The lab originated as a volunteer project, and several of its earliest contributors moved into full-time roles after the seed rounds closed [LinkedIn], a pattern that mirrors how Hugging Face and Stability built their initial benches.

The product surface has widened in 2025. Beyond Psyche, Nous released the Hermes Agent, an open-source agent framework that learns user projects, builds reusable skills, and is compatible with multiple model providers [GitHub]; [hermes-agent.nousresearch.com]. The Hermes line of fine-tuned models has been one of the more downloaded open-weights families on Hugging Face over the past 18 months, a reputational asset that gives the decentralized-training pitch a built-in audience.

Series A (Apr 2025) | 50 | $M
Seed (Jun 2024) | 20 | $M
Seed (Jan 2024) | 5.2 | $M

The honest counterfactual

What bears say: decentralized training has been proposed for years, and the wall-clock cost of coordinating gradients across the public internet has historically been worse than just renting an H100 cluster for a week. A $1 billion valuation on roughly $70 million raised, with a token structure attached, sets a high bar for the company to prove that Psyche scales past 15B parameters into the 70B-and-up range where open-source competition actually lives [Fortune, April 2025]. What bulls answer: the December 2024 Psyche run is the first public evidence that the bandwidth problem is tractable at meaningful scale, and the underlying DisTrO research has been documented in collaboration with a16z's open-source team, giving outside researchers a way to inspect the claims [Andreessen Horowitz]; [aibase.com]. The economics still need to be demonstrated end to end, but the technical premise is no longer purely theoretical.

What to watch

The next twelve months will be defined by scale. A Psyche training run at 70B parameters or above, completed over the public internet with verifiable provenance, would be the milestone that converts the thesis into a category. Watch for a flagship Hermes model trained entirely on Psyche infrastructure, for the first non-Nous research group to use the stack in production, and for whatever token mechanics Paradigm's structured round eventually surfaces. If any two of those land, the conversation about where open-source models come from changes.

The cultural question Nous is implicitly answering: in an era when the frontier of intelligence is being built inside five companies' data centers, who is allowed to train a model, and who gets to call it theirs?

Read on Startuply.vc