Paperzilla Is Building a Daily Research Digest for the Agentic Era

Mark Pors, the WatchMouse co-founder, is back with a command-line tool that filters arXiv, medRxiv, and bioRxiv into one personalized feed.

About Paperzilla

Published

Most working scientists open their laptop to the same problem: too many papers, too little signal. arXiv alone now ingests thousands of preprints a week, and medRxiv and bioRxiv add to the pile. Paperzilla, a small outfit based in The Hague, thinks the answer is not a prettier search box. It is a daily digest, delivered through a command-line tool, that scans the major preprint servers and returns one ranked list per researcher.

The company describes its method as hybrid keyword and semantic matching with AI reranking, applied to sources including arXiv, medRxiv, and bioRxiv, with the goal of producing one personalized digest per day [paperzilla.ai/about]. The interface that has drawn the most attention so far is the Paperzilla CLI, called pz, which lets a user list projects, fetch research feeds, and filter papers by priority, date, or count, with output piped as JSON or exported as an Atom feed [clawskills.sh]. That is a deliberate choice. Researchers who already live in a terminal can wire Paperzilla into their existing scripts, notebooks, and reading queues without learning a new web app.

The bet

The wedge is the working scientist or research engineer who treats paper triage as infrastructure, not as a browsing activity. Paperzilla's pitch is that aggressive filtering plus a single daily digest beats the inbox-style alerts that arXiv and Google Scholar have offered for years. The CLI extends that pitch to the agentic era: a feed exported as JSON or Atom is a feed an LLM agent can consume, summarize, or act on. Founder Mark Pors describes his current focus as building research context for the agentic era [RocketReach]. In plain English, that means making the world's preprint flow legible to both humans and the AI assistants those humans increasingly delegate to.

The product surface today is small and concrete. A command-line tool browses a curated research feed, manages projects, and tracks new papers [GitHub]. The web property at paperzilla.ai hosts the digest engine and documentation [Paperzilla]. The company has also published a retrieval dataset, paperzilla-rag-retrieval-250, on Hugging Face, which suggests the team is building and evaluating its own retrieval stack rather than relying entirely on a third-party search index [Hugging Face].

Why it could matter

The market for research discovery has been dominated for two decades by general-purpose tools (Google Scholar, Semantic Scholar) and by publisher-side alerting. The recent wave of LLM-native readers and agents has reopened the category. If a working researcher is going to hand part of their literature review to a model, the upstream question is which papers the model even sees. A high-recall, high-precision filter that runs daily and exports machine-readable output is a reasonable place to plant a flag. The total addressable user base, the global population of active researchers and R&D engineers, is in the millions, and the willingness to pay for time saved on triage is well established by incumbents like ReadCube and Paperpile.

There is also a structural tailwind. Preprint volume on arXiv, medRxiv, and bioRxiv has grown sharply over the last five years, and the share of AI and life-sciences work that breaks first as a preprint, rather than in a journal, keeps rising. Tools that sit on top of preprint servers and do aggressive ranking become more valuable as the underlying flow grows.

The team

Paperzilla is led by Mark Pors, founder and CEO [RocketReach]. Pors was previously co-founder and CTO of WatchMouse, a website monitoring company acquired by CA Technologies in 2011 [TechCrunch, 2010] [Stone Soup Coworking Space]. He is based in The Hague and was educated at Technical University Delft [LinkedIn]. His public engineering footprint is in Python, React, and React Native, with recent work focused on machine learning with LLMs applied to Paperzilla [GitHub]. A prior exit in infrastructure software is a useful credential for a founder now selling a tool to technical buyers who care about uptime and feed reliability.

What the bears say, what the bulls answer

The most credible pushback is competitive. A solo-founder company shipping a CLI faces a distribution question that pretty web apps with venture backing do not. The bull answer, supported by the cited evidence, is that the CLI is a feature, not a bug: it targets a specific, sticky user (the terminal-native researcher or research engineer) and produces output that plugs directly into the agentic workflows those users are already building. Owning the daily digest for that user, and shipping a retrieval dataset and Atom feed that other tools can consume, is a defensible wedge even in a crowded category.

What to watch

Three things over the next twelve months will tell the story. First, whether Paperzilla converts CLI users into paying subscribers, and at what price point; the docs portal and product surface suggest a paid tier is the natural next step [Paperzilla]. Second, whether the retrieval dataset on Hugging Face evolves into a public benchmark or stays an internal evaluation set [Hugging Face]; an open benchmark would be a credibility move in a field where retrieval quality is hard to compare. Third, whether Pors raises outside capital. A WatchMouse alumnus building in AI tooling out of the Netherlands is the kind of profile European seed funds actively court, and a priced round would change the resourcing picture quickly.

The interesting question for readers: in a category where the incumbents own the search box and the new entrants own the chat interface, can a daily digest delivered through a terminal become the default upstream layer for how researchers, and their agents, decide what to read next?

Cash Quintero covers fintech, payments, and emerging-market capital flows for Startuply.

Read on Startuply.vc