In a fab queue at TSMC, a 4nm wafer is being patterned for a chip that does exactly one thing. It cannot run a convolutional vision model from 2017. It will not accelerate a recommender system. It runs transformers, the architecture behind GPT, Claude, and most of what people now call AI, and according to its makers it runs them about 20 times faster than an Nvidia H100 [TechFundingNews]. The chip is called Sohu, and the company behind it, Etched.ai, has convinced investors to put roughly $620 million behind the proposition that the entire AI industry has converged on a single mathematical shape worth burning into silicon [Yahoo Finance].
Etched was founded in 2022 by Chris Zhu, Gavin Uberti, and Robert Wachen, three Harvard dropouts who decided the GPU was the wrong abstraction for the workload that now dominates global compute spending [TechFundingNews]. Their wedge is narrow on purpose. A general-purpose GPU spends transistors on flexibility: control logic, scheduling, support for dozens of operator types. Sohu spends those transistors on transformer math instead, paired with 144 GB of HBM3 per chip [source 16] and built on TSMC's 4nm process [source 8]. If the workload of the next decade really is attention layers stacked on attention layers, that trade pays off. If it is not, Sohu is an expensive paperweight. Uberti and his co-founders have been refreshingly direct that this is the bet.
The market shape is what makes the bet defensible. Inference, not training, is where the money flows once a model is deployed, and inference is where specialization pays. A hyperscaler running a single frontier model across billions of queries cares about tokens per second per dollar and tokens per second per watt, full stop. Nvidia's H100 and B200 are extraordinary general-purpose machines, which is precisely why they cannot be the cheapest way to run one specific architecture at scale. That gap is the opening Etched is walking through, and it is the same logic that produced Google's TPU and Amazon's Trainium internally. Etched's pitch is that the merchant market deserves the same option.
Investors have agreed in size. Stripes led a $500 million round at a reported $5 billion valuation [Yahoo Finance][TechFundingNews], following an earlier $120 million Series A in June 2024 [Crunchbase News]. The company has disclosed a partnership with TSMC [source 8] and has reportedly secured early customers who have put deposits down on hardware [source 8]. Rambus has publicly described its collaboration with Etched on memory interfaces [Rambus]. None of this guarantees a working product at volume, but the supply chain footprint is real.
Series A June 2024 | 120 | $M
Later round | 120 | $M
Stripes-led round | 500 | $M
The team has filled in around the founders in ways that matter for a company that now has to ship physical silicon. Mark Ross, the former CTO of Cypress Semiconductor, brings the kind of tenure that ASIC tape-outs require [source 7]. Uberti, the CEO, previously worked at OctoML and Xnor.ai, both compiler-and-inference shops that are directly relevant to the software stack Sohu will need [source 7]. Wachen serves as COO [source 15]. On the commercial side, Chase Holmes joined from Databricks and MosaicML [source 15], and Alexandra Boyle, previously CCO at Symphony, leads financial services [source 15][source 4]. That is a recognizable shape for a deeptech company preparing to sell into hyperscalers and large enterprises rather than hobbyists.
What the bears say is straightforward and worth taking seriously. Nvidia is not a static target: Blackwell is shipping, the CUDA moat compounds with every model trained against it, and the company has the cash to cut prices on inference SKUs the moment a credible challenger lands a marquee customer. Custom silicon from Google, Amazon, and Meta also crowds the same thesis from above. And transformer dominance, while real today, is a research bet as much as an engineering one: state-space models, mixture-of-experts variants, and diffusion architectures all chip at the assumption that attention is forever. What the bulls answer is that Etched does not need to beat Nvidia across the board. It needs to be meaningfully cheaper per token on the handful of model families that will run at planetary scale, and the disclosed performance claim of roughly 20x an H100 on transformer inference [TechFundingNews], if it survives contact with customer benchmarks, is a wide enough margin to absorb a lot of Nvidia price cuts.
A back of envelope on the unit economics is useful here. An H100 draws about 700 watts at load and lists around $30,000. Suppose Sohu lands at a similar power envelope and a similar per-chip price, and delivers even five times the transformer-inference throughput rather than the headline twenty. A hyperscaler running a 100,000-accelerator inference fleet that currently spends roughly $600 million a year on electricity at $0.08 per kWh would, at five times the throughput per chip, need only 20,000 Sohus to do the same work. That is roughly $480 million a year in power saved and about $2.4 billion in capex avoided over a refresh cycle, against a chip purchase of $600 million. Numbers in this neighborhood are why a $5 billion valuation for a pre-revenue ASIC company is not as exotic as it sounds. The same math is why Nvidia will defend the segment hard.
What to watch over the next twelve months is concrete. First, independent third-party benchmarks of Sohu against Blackwell on a published model, ideally Llama or a comparable open-weights transformer. Second, a named launch customer beyond the reserved-deposit cohort. Third, the software story: a usable compiler and serving stack is what turns a fast chip into a product, and it is where most ASIC startups have historically stumbled. Fourth, the next funding round, which at current burn for a company doing 4nm tape-outs is probably a question of when rather than whether.
The company Etched has to beat is Nvidia, specifically the H100 and B200 inference deployments inside the five or six buyers that account for most frontier-model serving. Nvidia will not lose that business on charisma. It will lose it, if at all, on a spreadsheet, one where Sohu's joules per token are small enough that even a CUDA-loyal infrastructure lead has to take the meeting.