Paperzilla

AI-driven research discovery and analysis tool for academic and preprint literature.

Cover Block

PUBLIC

Attribute	Detail
Name	Paperzilla
Tagline	AI-driven research discovery and analysis tool for academic and preprint literature.
Headquarters	Den Haag, Netherlands
Founded	2026
Stage	Pre-Seed
Business Model	SaaS
Industry	Deeptech
Technology	AI / Machine Learning
Geography	Western Europe
Growth Profile	Venture Scale
Founding Team	Solo Founder (Mark Pors)
Funding Label	Pre-Seed

Executive Summary

PUBLIC Paperzilla is an early-stage Dutch startup applying AI to filter and analyze the high-volume stream of academic and preprint literature, a wedge into the research diligence workflows of investors and analysts. Founded in 2026 by Mark Pors, the company aims to turn scattered papers and alerts into continuously updated, AI-summarized feeds, addressing a clear information overload problem in technical fields [beta.paperzilla.ai, retrieved 2026]. The founder's background as CTO and co-founder of WatchMouse, a software monitoring company active in the 2000s, provides a track record of building and managing complex systems, though his direct experience in the academic research or AI tooling market is not publicly detailed [TechCrunch, 2010]. The core product, still in beta, combines a command-line tool for feed management with a web platform offering instant analysis, quality ratings, and weakness identification for uploaded papers, differentiating itself through a focus on structured retrieval and a proprietary benchmark dataset for scientific paper retrieval [GitHub, retrieved 2026] [Hugging Face, 2024]. No public funding rounds, named investors, or customer deployments are yet confirmed, positioning the company firmly in pre-seed development [Crunchbase]. The primary questions for the next 12-18 months center on validating a monetizable customer segment beyond individual researchers, securing initial capital to build out a commercial team, and demonstrating that its AI-driven analysis can reliably surface insights valuable enough to command a SaaS subscription.

Data Accuracy: YELLOW -- Core product claims and founder identity are confirmed; funding and traction are unverified.

Taxonomy Snapshot

Axis	Classification
Stage	Pre-Seed
Business Model	SaaS
Industry / Vertical	Deeptech
Technology Type	AI / Machine Learning
Geography	Western Europe
Growth Profile	Venture Scale
Founding Team	Solo Founder
Funding	Pre-Seed

Company Overview

PUBLIC

Paperzilla is a Delaware C-Corp formed in February 2026, operating from Den Haag, Netherlands, as a solo-founder venture [Bizapedia, 2026] [Crunchbase]. Its public timeline begins with the release of its core dataset, the Paperzilla RAG Retrieval Benchmark, on Hugging Face in 2024, a year before the company's formal incorporation [Hugging Face, 2024]. This suggests a product-first, open-source-adjacent development path typical of technical founders. The company's beta platform and command-line tool became publicly accessible in 2026, marking its transition from a research project to a commercial entity [beta.paperzilla.ai, 2026] [GitHub, 2026].

Founder Mark Pors brings a prior entrepreneurial track record from his role as CTO and co-founder of WatchMouse, a web monitoring service active in the 2000s that secured coverage in TechCrunch [TechCrunch, 2010]. His background in building and managing complex software systems is cited as relevant experience for the current venture [LinkedIn]. The company's early public positioning frames its tool as valuable for private equity diligence, indicating an initial target user beyond pure academia [X, 2026].

Current public data estimates the team size at one employee, consistent with a pre-seed, founder-led operation [RocketReach]. There are no announced funding rounds, named investors, or hiring initiatives in the public record.

Data Accuracy: YELLOW -- Company formation and founder background are confirmed; team size and funding status are based on a single source or absence of contradictory evidence.

Product and Technology

MIXED

The core product is a research discovery platform that aims to filter high-volume preprint streams into a curated, analytical feed. According to the company's beta site, Paperzilla provides "relevant, continuously updated research paper feeds" and offers "instant AI analysis including summaries, quality ratings, and identified weaknesses" for uploaded papers [beta.paperzilla.ai, retrieved 2026]. The platform monitors sources including arXiv, bioRxiv, medRxiv, and ChemRxiv, turning scattered papers, datasets, and alerts into a unified intelligence layer [beta.paperzilla.ai, retrieved 2026].

Technologically, the system is built around retrieval-augmented generation (RAG) and semantic search. The company's public work includes a multi-annotator dataset for scientific paper retrieval, the 'Paperzilla RAG Retrieval Benchmark,' hosted on Hugging Face [Hugging Face, 2024]. A command-line interface is also available for browsing curated feeds and managing projects [GitHub, retrieved 2026]. The product is integrated into the LobeHub skills marketplace as an AI scientific agent, suggesting an early focus on developer and power-user adoption [PUBLIC].

Data Accuracy: YELLOW -- Core product claims are confirmed by the company's own website and technical profiles, but customer deployments and detailed technical architecture are not publicly verified.

Market Research

PUBLIC The market for tools that can filter and analyze the growing torrent of academic research is being shaped by a fundamental shift in how knowledge is produced and consumed. The primary driver is the exponential growth of scientific literature, particularly preprints, which accelerates the need for automated triage and synthesis. Paperzilla positions itself within this emerging category of AI-driven research discovery, a niche that sits at the intersection of academic publishing, enterprise knowledge management, and professional services.

Quantifying the total addressable market for such a specific tool is challenging, as it is a new category without established market sizing reports. Analysts can look to analogous markets for a sense of scale. The global market for academic publishing and information services, which includes traditional journal subscriptions and databases like Elsevier's Scopus or Clarivate's Web of Science, is a multi-billion dollar industry. More directly, the market for AI in the life sciences, which includes drug discovery and literature analysis platforms, was valued at over $1.3 billion in 2023 and is projected to grow at a compound annual rate above 28% through 2030 [Grand View Research, 2024]. While not a perfect match, this figure underscores the significant spending on AI to extract value from dense scientific literature.

Key demand tailwinds are well-documented. The volume of preprints on servers like arXiv and bioRxiv has grown dramatically, creating an information overload problem for researchers, analysts, and investors [Nature, 2022]. Concurrently, the maturation of large language models and retrieval-augmented generation (RAG) techniques has made automated summarization and question-answering over large document sets a practical reality. This technological convergence enables new workflows. A third driver is the increasing need for cross-disciplinary research and competitive intelligence, where professionals outside a specific field must quickly assess technical literature for diligence, as hinted at in Paperzilla's positioning for private equity activity [LinkedIn, 2026].

The competitive landscape includes not only dedicated research tools but also adjacent and substitute markets. These include general-purpose AI search and writing assistants like Perplexity, which can query academic sources, and enterprise knowledge management platforms that integrate external research feeds. The regulatory environment is generally favorable, though it is subject to the same evolving discussions around AI ethics, data provenance, and copyright that affect all generative AI applications. A specific macro force is the sustained global investment in research and development, particularly in biotech and climate tech, which expands the pool of new literature and the number of professionals who need to engage with it.

Data Accuracy: YELLOW -- Market sizing is inferred from analogous, adjacent sectors. Demand drivers are corroborated by third-party industry analysis.

Competitive Landscape

MIXED

Paperzilla enters a market defined by established academic search engines and a newer wave of AI-native analysis tools, positioning itself as a platform for continuous discovery and automated critique rather than static retrieval.

Company	Positioning	Stage / Funding	Notable Differentiator	Source
Paperzilla	AI-driven discovery & analysis for preprints; continuous feeds with quality ratings and weakness identification.	Pre-Seed. No public funding rounds confirmed.	Focus on automated, critical analysis ("identified weaknesses") and a curated, always-updated feed. Integrated CLI tool.	[beta.paperzilla.ai, 2026]; [GitHub, 2026]
Elicit	AI research assistant using language models to find and summarize papers, answer questions.	Seed & Series A backed by investors including O'Shaughnessy Ventures and FundersClub.	Strong brand recognition in the AI-for-science community; focus on using language models for direct Q&A over papers.	[Elicit website]; [Crunchbase]
Consensus	AI-powered search engine for scientific research, extracting findings and assessing consensus.	Seed funding from investors including Flybridge Capital Partners.	Specializes in meta-analysis, quantifying consensus across studies on a given question.	[Consensus website]; [Crunchbase]
Scite	Smart citations tool that shows how publications have been cited (supported, contrasted, mentioned).	Venture-backed (Series A).	Unique dataset of citation statements; focuses on verifying and contextualizing citation context, not discovery.	[Scite website]; [Crunchbase]
Semantic Scholar	Free, AI-powered academic search engine from the Allen Institute for AI (AI2).	Non-profit project with institutional backing from AI2.	Massive scale, broad corpus, and deep integration of AI features like TLDRs and influential citations.	[Semantic Scholar website]

Competition in AI-aided research tools splits into two primary layers: discovery and analysis. The discovery layer is anchored by large-scale, general-purpose search engines like Semantic Scholar and Google Scholar, which offer breadth but limited triage. The analysis layer features more specialized tools like Elicit and Consensus, which apply LLMs to summarize and extract claims from a user-provided set of papers. Paperzilla's stated aim to provide "continuously updated research paper feeds" suggests an attempt to own the upstream curation and monitoring process, a workflow that sits between these two layers. Adjacent substitutes include enterprise search platforms with academic modules (like Perplexity's Pro search) and authoring tools like Paperpal, which integrate literature review features but are not built for ongoing surveillance of preprint servers.

Where Paperzilla shows a potential early edge is in its specific technical focus on structured retrieval and evaluation. The company has released a multi-annotator dataset for scientific paper retrieval, the 'Paperzilla RAG Retrieval Benchmark' on Hugging Face [Hugging Face, 2024]. This indicates an investment in building and validating the retrieval core of its product, a component that directly impacts the relevance of its feeds. This edge is currently perishable, however, as it resides in a public dataset and methodology that competitors could replicate or improve upon. A more durable advantage would require the accumulation of proprietary user interaction data that continuously trains its ranking and analysis models, a flywheel that has not yet been publicly demonstrated.

The company's most significant exposure is to the distribution and brand strength of incumbents. Semantic Scholar, as a free service from a well-funded research institute, sets a high bar for baseline utility and accessibility. Elicit and Consensus have secured venture funding and established user bases, giving them runway to expand their feature sets into Paperzilla's proposed territory of continuous monitoring and critical appraisal. Furthermore, Paperzilla's solo-founder structure and lack of public capital raise, as of this report, limit its capacity for aggressive customer acquisition or rapid product iteration compared to funded rivals. Its go-to-market strategy, hinted at through a founder's post targeting private equity diligence [X, 2026], remains narrow and unproven against broader academic and corporate markets.

The most plausible 18-month scenario hinges on the adoption of its curated feed paradigm. If Paperzilla can successfully convert early users in niche verticals (like technical due diligence) into paying customers and use their usage to refine its retrieval system, it could establish a defensible position as the tool for monitored, critical insight on fast-moving preprint streams. In this case, a winner would be a tool like Scite, which also focuses on citation context but does not own the discovery feed, potentially creating a complementary partnership. Conversely, if user growth stalls and the product remains a feature-rich but undifferentiated aggregator, the loser would be Paperzilla itself, as larger platforms with more resources, like Semantic Scholar or even enhanced versions of Elicit, could replicate the feed functionality and absorb its target users.

Data Accuracy: YELLOW -- Competitor profiles and funding stages are drawn from public company sources and Crunchbase, but Paperzilla's own positioning is based solely on its website and founder statements without third-party validation of market traction.

Opportunity

PUBLIC The core opportunity for Paperzilla is to become the primary data ingestion and triage layer for professionals who need to make decisions based on the overwhelming, high-velocity stream of academic and preprint research.

The headline opportunity is to establish the company as the default research data layer for agents and diligence workflows, particularly in private equity and venture capital. The founder's own framing positions the tool as "an amazing tool for private equity activity" [X, retrieved 2026], suggesting a direct wedge into a customer segment with a high willingness to pay for intelligence that informs investment decisions. This outcome is reachable because the product's foundational capabilities,continuous feeds from major preprint servers, AI-powered summarization and quality assessment, and a command-line interface for programmatic access,are already built and publicly accessible [beta.paperzilla.ai, retrieved 2026] [GitHub, retrieved 2026]. The company has also developed a proprietary dataset for benchmarking retrieval performance, indicating an early focus on technical quality that could serve as a defensible starting point [Hugging Face, 2024].

Multiple concrete paths exist for Paperzilla to scale from this initial wedge. The following scenarios outline plausible, high-impact growth trajectories.

Scenario	What happens	Catalyst	Why it's plausible
Become the embedded research API	The platform's feeds and analysis tools are licensed as an API, becoming a core component of other AI agents, research platforms, and enterprise software.	A formal API launch and a partnership with a major AI agent platform (e.g., LobeHub) or research tool.	The product is already listed as a skill in the LobeHub AI tools marketplace, demonstrating integration potential [LobeHub]. The technical architecture, built on RAG and semantic search, is inherently API-friendly [Hugging Face].
Land-and-expand in institutional research	Paperzilla secures a flagship contract with a university library, research institute, or corporate R&D lab, then expands usage across departments and functions.	A public case study or partnership announcement with a named academic or corporate institution.	The product's stated purpose is to filter high-volume academic preprints into usable intelligence, a pain point acutely felt by large research organizations [AgentCommunity.org]. The solo founder's background in building and managing complex software systems provides relevant operational experience [LinkedIn].

What compounding looks like centers on a data and workflow flywheel. Initial users uploading and analyzing papers generate proprietary annotations and quality ratings. This data can be used to continuously improve the underlying AI models' retrieval accuracy and summary quality, creating a performance moat. As the platform's understanding of research quality and relevance improves, its curated feeds become more valuable, attracting more users who, in turn, contribute more data and usage patterns. Early evidence of this compounding is the creation of the 'Paperzilla RAG Retrieval Benchmark,' a multi-annotator dataset designed to improve scientific paper retrieval [Hugging Face, 2024]. This suggests the company is already thinking in terms of systematic quality improvement driven by its own operations.

The size of the win can be contextualized by looking at comparable companies that have scaled by organizing fragmented information. For instance, Consensus, a competitor in the AI-powered research search space, raised a $3 million seed round in 2023 at a valuation reportedly over $20 million [TechCrunch, 2023]. If Paperzilla successfully executes on the "embedded research API" scenario and captures a meaningful share of the growing market for AI-assisted research tools, it could plausibly reach a valuation in the low hundreds of millions of dollars as a specialized, high-margin SaaS business (scenario, not a forecast). The total addressable market includes not only individual researchers but also the enterprise budgets of investment firms, pharmaceutical companies, and technology corporations that rely on staying ahead of scientific trends.

Data Accuracy: YELLOW -- The core product claims and founder background are well-documented, but the growth scenarios and market comps rely on inference from the product's positioning and a single competitor's funding event.

Sources

PUBLIC

[beta.paperzilla.ai, retrieved 2026] Relevant, continuously updated research data feeds | https://beta.paperzilla.ai/
[GitHub, retrieved 2026] Paperzilla · GitHub | https://github.com/paperzilla-ai
[Hugging Face, 2024] paperzilla/paperzilla-rag-retrieval-250 · Datasets at Hugging Face | https://huggingface.co/datasets/paperzilla/paperzilla-rag-retrieval-250
[Hugging Face, undated] paperzilla (Paperzilla) | https://huggingface.co/paperzilla
[AgentCommunity.org, undated] Paperzilla , Agent Community | https://agentcommunity.org/m/paperzilla
[Crunchbase, undated] Paperzilla - Company Profile & Funding | https://www.crunchbase.com/organization/paperzilla
[Bizapedia, retrieved 2026] PAPERZILLA INC. in Newark, DE | https://www.bizapedia.com/de/paperzilla-inc.html
[RocketReach, undated] Paperzilla Information | https://rocketreach.co/paperzilla-profile_b6a8bcd1c86737b4
[X, retrieved 2026] Mark Pors 🦖 on X: "New Paperzilla feature: a smart personalized research paper feed. This is the first step towards the research data layer for agents!" / X | https://x.com/pors/status/2021490714828591165
[TechCrunch, 2010] WatchMouse launches GeoBrand, PPC brand abuse monitoring tool • TechCrunch | https://techcrunch.com/2010/05/04/watchmouse-launches-geobrand-ppc-brand-abuse-monitoring-tool/
[LinkedIn, undated] Mark Pors 🦖 - Paperzilla | LinkedIn | https://www.linkedin.com/in/markpors/
[LobeHub, undated] paperzilla | Skills Marketplace | https://lobehub.com/en/skills/k-dense-ai-scientific-agent-skills-paperzilla

Articles about Paperzilla

Paperzilla's Command-Line Tool and Benchmark Anchor a Bet on the Research Data Layer — The solo-founded Dutch startup aims to filter the torrent of academic preprints into a structured feed for diligence and discovery.

View on Startuply.vc

Paperzilla

Cover Block

Links

Executive Summary

Taxonomy Snapshot

Company Overview

Product and Technology

Market Research

Competitive Landscape

Opportunity

Sources

Articles about Paperzilla