Paperzilla's Command-Line Tool and Benchmark Anchor a Bet on the Research Data Layer

The solo-founded Dutch startup aims to filter the torrent of academic preprints into a structured feed for diligence and discovery.

About Paperzilla

Published

The average researcher faces a firehose of over 8,000 new preprints a month. Paperzilla, a Den Haag-based startup, is betting that number is not a problem but a market. Its proposition is a data layer for research, turning scattered papers, datasets, and alerts into structured, AI-analyzed intelligence [beta.paperzilla.ai, retrieved 2026].

The wedge: from preprint torrent to structured feed

Paperzilla's initial product surfaces are a command-line tool for browsing curated feeds and a web interface for uploading papers to receive instant AI analysis [GitHub, retrieved 2026]. The analysis includes summaries, quality ratings, and identified weaknesses. The platform monitors major preprint servers like arXiv, bioRxiv, and medRxiv, aiming to filter increasing volume into usable intelligence [AgentCommunity.org]. Founder Mark Pors has framed the tool as valuable for private equity activity, suggesting an early wedge into investment diligence where technical research is a critical, yet overwhelming, input [X, retrieved 2026].

The technical foundation

Unlike a simple chat interface, the company's public work indicates a focus on the retrieval infrastructure underpinning AI analysis. On Hugging Face, Paperzilla has released a multi-annotator dataset for scientific paper retrieval, dubbed the 'Paperzilla RAG Retrieval Benchmark' [Hugging Face, 2024]. The company lists its utilized technologies as RAG, semantic search, Structured RAG, and evals [Hugging Face]. This suggests a bet that differentiation will come from superior data structuring and retrieval accuracy, not just the language model on top.

Competitor Primary Focus Key Differentiation
Elicit / Consensus AI-powered literature search & summarization Broad question-answering across published literature
Scite Smart citations & reliability checking Focus on citation context and paper credibility
Semantic Scholar Academic search engine (AI2) Large-scale scholarly graph and free public access
Paperzilla Continuous feeds & structured analysis Personalized feeds, command-line tool, retrieval benchmark

The solo founder and the early-stage reality

The company is a solo venture by Mark Pors, who was previously CTO and co-founder of WatchMouse, a website monitoring service active in the 2000s [TechCrunch, 2010]. Paperzilla was incorporated in Delaware in February 2026 and is estimated to have one employee [Bizapedia, retrieved 2026] [RocketReach]. There are no publicly verifiable funding rounds or named investors. The competitive set is crowded with well-funded entities like Semantic Scholar (backed by AI2) and popular tools like Elicit. Paperzilla's path will require moving beyond a technical demo to defined customer segments and a clear pricing model, details which are not yet public.

The immediate risks for a pre-seed, solo-founded startup in this space are straightforward.

  • Commercial traction. The product's public positioning is generic ('research discovery platform'), with no named customers, case studies, or visible pricing [AgentCommunity.org].
  • Funding runway. With no disclosed rounds, the operational horizon is unclear. Scaling the data ingestion and model refinement required for a robust 'research data layer' is capital-intensive.
  • Product definition. The tool could serve academics, biotech analysts, or venture capitalists. Without a focused beachhead, it risks building a powerful tool for an audience that doesn't pay.

For now, the company's tangible assets are its benchmark dataset and its functional CLI tool,a foundation, but not yet a business. The question for any watching investor is whether Pors can convert his prior experience in managing complex systems into a sales motion that lands the first enterprise contract [LinkedIn]. Can a command-line tool and a retrieval benchmark in Den Haag become the pipe that feeds the next generation of research-driven deals?

Sources

  1. [beta.paperzilla.ai, retrieved 2026] Paperzilla homepage | https://beta.paperzilla.ai/
  2. [GitHub, retrieved 2026] Paperzilla CLI tool repository | https://github.com/paperzilla-ai
  3. [AgentCommunity.org] Paperzilla profile | https://agentcommunity.org/m/paperzilla
  4. [X, retrieved 2026] Mark Pors post on private equity use case | https://x.com/pors/status/2021490714828591165
  5. [Hugging Face, 2024] Paperzilla RAG Retrieval Benchmark dataset | https://huggingface.co/datasets/paperzilla/paperzilla-rag-retrieval-250
  6. [Hugging Face] Paperzilla profile listing technologies | https://huggingface.co/paperzilla
  7. [TechCrunch, 2010] WatchMouse launches GeoBrand | https://techcrunch.com/2010/05/04/watchmouse-launches-geobrand-ppc-brand-abuse-monitoring-tool/
  8. [Bizapedia, retrieved 2026] PAPERZILLA INC. incorporation details | https://www.bizapedia.com/de/paperzilla-inc.html
  9. [RocketReach] Paperzilla company information | https://rocketreach.co/paperzilla-profile_b6a8bcd1c86737b4
  10. [LinkedIn] Mark Pors LinkedIn profile | https://www.linkedin.com/in/markpors/

Read on Startuply.vc