Paper Espresso: From Paper Overload to Research Insight
Abstract
The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries with topical labels and keywords, and provides multi-granularity trend analysis at daily, weekly, and monthly scales through LLM-driven topic consolidation. Over 35 months of continuous deployment, Paper Espresso has processed over 13,300 papers and publicly released all structured metadata, revealing rich dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence (6,673 unique topics), and a positive correlation between topic novelty and community engagement (2.0× median upvotes for the most novel papers).
- The AI research frontier is broadening, not converging. New topics emerge at an undiminished rate (up to 408/month) while Shannon entropy remains stable (~7.9 bits), indicating sustained diversification. Researchers should actively monitor peripheral topics to avoid tunnel vision.
- Topics peak slowly but fade fast. The median topic takes 8 months to reach peak prominence yet loses half of it within a single month, making timely awareness critical. Systems that report trends only retrospectively risk delivering insights after the window of opportunity has closed.
- Novelty attracts attention. Papers combining unexpected topic pairs receive 2.0× the upvotes of those with conventional combinations. The community rewards cross-pollination, and recommendation systems should surface surprising intersections.
- Popularity and engagement are distinct signals. The most frequent topic (LLMs, 13.6%) is far from the most engaging per paper; niche topics like Pre-training Strategies and GUI Agents draw 2-4× higher median upvotes. Effective curation must weigh both volume and per-paper impact.
1. Introduction
The pace of scientific publishing now outstrips any individual researcher's capacity to stay informed. As shown in Figure 1, arXiv alone receives nearly 30,000 submissions per month, with no sign of deceleration. This creates an acute information asymmetry: the collective frontier advances rapidly, yet each researcher's awareness lags behind, filtered through keyword alerts and social media curation.
Existing platforms such as Semantic Scholar [1], Papers with Code [16], and ArXiv Sanity [8], along with LLM-powered tools like PaSa, LitLLM, and ScholarCopilot, address fragments of this problem (indexing, retrieval, or writing assistance) but remain fundamentally reactive: they require researchers to already know what to look for. None provides proactive, continuous monitoring that combines structured paper comprehension with temporal trend analysis.
We present Paper Espresso, an open-source system that continuously ingests community-validated trending papers, distills each into a structured summary, and proactively surfaces emerging research directions. It makes three contributions:
- Open structured dataset. We publicly release a structured dataset of LLM-generated paper summaries, topical labels, and keywords on Hugging Face (13,388 papers, 6,673 topics, 51,036 authors), continuously updated via automated pipelines.
- Multi-granularity trend analysis. The system surfaces trending research directions at daily, monthly, and lifecycle scales through LLM-driven topic consolidation.
- Longitudinal empirical analysis. Over 35 months of deployment, we reveal dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence, a topic co-occurrence map exposing cross-cutting methodologies, and a divergence between topic frequency and engagement.
2. System Architecture
The system is organized as modular CLI-driven pipelines (daily, monthly, and lifecycle) backed by a Streamlit web frontend. All data is persisted to four public Hugging Face datasets in date-partitioned Parquet format, ensuring full reproducibility.
Data Ingestion Layer
We source papers from the Hugging Face Daily Papers API, a community-curated feed where users upvote notable arXiv preprints. This yields a focused stream of ~2-3% of arXiv, with upvote counts serving as a lightweight proxy for community attention.
Paper Processing Layer
The processing layer invokes LLMs via LiteLLM, decoupling the pipeline from any model provider. Each paper's title, abstract, and full PDF are sent as a single multimodal request. The returned JSON contains: (1) a concise summary, (2) a detailed pros/cons analysis, (3) open-vocabulary topic labels, and (4) technical keywords. Trend analysis consolidates hundreds of fine-grained topics into ~20 coherent clusters (~50:1 compression).
Presentation Layer
The web interface exposes three views: Daily (papers sorted by upvotes with expandable summaries), Monthly (deduplicated papers with LLM-generated trend narratives), and Lifecycle (Gartner Hype Cycle [6] chart with per-topic time-series).
3. Datasets
Paper Espresso publicly releases three complementary datasets on HF Hub, continuously updated via automated pipelines. All datasets are stored as date-partitioned Parquet files.
| Dataset ▲▼ | Records ▲▼ | Splits ▲▼ |
|---|---|---|
| hf_paper_summary | 13,388 | 733 days |
| hf_paper_daily_trending | 733 | 733 days |
| hf_paper_monthly_trending | 34 | 34 months |
| hf_paper_lifecycle | 18 | 18 bi-months |
| Field | Type | Description |
|---|---|---|
| Paper Summaries (hf_paper_summary) | ||
| paper_id | str | arXiv identifier |
| title | str | Paper title |
| authors | list | List of author names |
| upvotes | int | Community vote count |
| concise_summary | str | TL;DR (avg. 551 chars) |
| detailed_analysis | str | Pros/cons analysis (avg. 1,827 chars) |
| topics | list | Fine-grained topic labels (avg. 3.03) |
| keywords | list | Extracted keywords |
| Trending Reports (daily / monthly) | ||
| trending_summary | str | Narrative overview of themes |
| top_topics | list | Ranked dominant topics |
| topic_mapping | dict | Maps consolidated labels to originals (monthly only) |
| Lifecycle Snapshots (hf_paper_lifecycle) | ||
| lifecycle_data | dict | Per-topic phase, peak, slope, counts |
| sorted_months | list | Ordered month labels in snapshot |
| n_papers | int | Cumulative paper count at snapshot |
4. Empirical Analysis
Our analysis spans 35 months of deployment (May 2023 to April 2026) and covers four dimensions: (1) paper volume growth and community engagement patterns, (2) topic distribution, temporal evolution, and co-occurrence structure, (3) topic lifecycle classification and velocity, and (4) the relationship between paper novelty and community engagement.
4.1 Paper Volume and Community Engagement
Monthly intake grew from 259 papers in May 2023 to a peak of 923 in October 2025, averaging 18.8 papers on weekdays versus 3.3 on weekends. Community upvotes are heavily right-skewed (skewness = 5.28): the median paper receives 13 upvotes, yet the 90th percentile reaches 52 and the maximum is 664.
4.2 Topic Landscape and Dynamics
Topic Distribution
With an average of 3.03 topic labels per paper, the system produces 6,673 unique fine-grained topics across 13,388 papers. The monthly consolidation step merges semantically equivalent labels, reducing hundreds of labels to 15-20 coherent clusters (~50:1 compression).
Topic Temporal Evolution
Figure 5 shows how topic dominance shifts over time. In early 2025, Large Language Models and Diffusion Models led the landscape. By mid-2025, Reinforcement Learning surged to the top, driven by rapid adoption of Group Relative Policy Optimization (GRPO) [15] and Reinforcement Learning with Verifiable Rewards (RLVR) [9] for LLM reasoning.
Topic Emergence and Diversity
New topics appear at a rate of 19-408 per month with no sign of saturation, while Shannon entropy over the monthly topic-frequency distribution remains stable around 7.9 bits (range 6.9-8.6). Together these indicate that the research frontier continues to diversify rather than collapsing toward a few dominant themes.
Topic Co-occurrence
Figure 7 shows raw co-occurrence counts (lower triangle) and Jaccard similarity (upper triangle) for the top-10 topics. Three patterns emerge: (1) RL as cross-cutting methodology: Reinforcement Learning has the highest co-occurrence with LLMs (215), VLMs (152), and Multimodal LLMs (132). (2) Generative-vision cluster: Diffusion Models pairs strongly with Video Generation (197). (3) Frequency is not affinity: the top-count pair (RL + LLMs, 215) has only moderate Jaccard (0.08) because both topics are individually common.
Keyword Evolution
Tracking keywords within a topic reveals which specific methods drive its rise or fall. In Reinforcement Learning, RLHF [11] (~25% of RL papers in mid-2024) was rapidly displaced by GRPO [15] (~65% by early 2025) and RLVR [9]. In Diffusion Models, the UNet-to-Transformer architectural migration is evident: Stable Diffusion [14] and ControlNet [19] faded while DiT [12] and Flow Matching [10] gained steady traction.
4.3 Topic Lifecycle
We adapt the Gartner Hype Cycle [6] to bibliometric data. For every topic with at least 15 papers, we compute monthly proportion, then classify each into one of five lifecycle phases based on peak timing, decline ratio, and recent trend slope.
Topic Velocity
For each topic with ≥15 papers, we measure time to peak and half-life. The contrast is stark: the median time to peak is 8 months, but the median half-life is just 1 month. AI research topics rise gradually yet decline abruptly.
4.4 Paper Novelty and Community Engagement
We investigate whether papers with unusual topic combinations attract more community attention. For each paper with at least two topic labels, we define a novelty score as the negated mean Pointwise Mutual Information (PMI) across all co-assigned topic pairs. Papers combining commonly co-occurring topics score low; those with unexpected pairings score high.
6. Conclusion
Paper Espresso is an open-source system that converts the daily stream of AI papers into structured summaries and multi-granularity trend reports. Analysis over 35 months reveals non-saturating topic emergence (6,673 unique labels), rapid topic decay (median half-life of one month), and a positive novelty-engagement effect (2.0× median upvotes for unconventional topic combinations). All code, data, and a live demo are publicly available.
References
- Ammar, W. et al. (2018). Construction of the Literature Graph in Semantic Scholar. NAACL.
- Blei, D.M. et al. (2003). Latent Dirichlet Allocation. JMLR, 3:993-1022.
- Boutaleb, Y. et al. (2024). BERTrend: Neural Topic Modeling for Emerging Trends Detection. arXiv:2407.04271.
- Cachola, I. et al. (2020). TLDR: Extreme Summarization of Scientific Documents. EMNLP Findings.
- Chen, C. (2006). CiteSpace II: Detecting and Visualizing Emerging Trends. JASIST, 57(3):359-377.
- Fenn, J. & Raskino, M. (2008). Mastering the Hype Cycle. Harvard Business Press.
- Grootendorst, M. (2022). BERTopic: Neural Topic Modeling with a Class-based TF-IDF Procedure. arXiv:2203.05794.
- Karpathy, A. (2016). ArXiv Sanity Preserver. arxiv-sanity.com.
- Lambert, N. et al. (2024). Reinforcement Learning with Verifiable Rewards. arXiv:2411.15124.
- Lipman, Y. et al. (2023). Flow Matching for Generative Modeling. ICLR.
- Ouyang, L. et al. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS.
- Peebles, W. & Xie, S. (2023). Scalable Diffusion Models with Transformers. ICCV.
- Rafailov, R. et al. (2023). Direct Preference Optimization. NeurIPS.
- Rombach, R. et al. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. CVPR.
- Shao, Z. et al. (2024). DeepSeekMath: Pushing the Limits of Mathematical Reasoning. arXiv:2402.03300.
- Stojnic, R. et al. (2019). Papers with Code. paperswithcode.com.
- van Eck, N.J. & Waltman, L. (2010). Software Survey: VOSviewer. Scientometrics, 84(2):523-538.
- Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in LLMs. NeurIPS.
- Zhang, L. et al. (2023). Adding Conditional Control to Text-to-Image Diffusion Models. ICCV.