Paper Espresso: From Paper Overload to Research Insight

Mingzhe Du^1,2, Anh Tuan Luu², Dong Huang¹, See-Kiong Ng¹

¹National University of Singapore · ²Nanyang Technological University

Abstract

The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries with topical labels and keywords, and provides multi-granularity trend analysis at daily, weekly, and monthly scales through LLM-driven topic consolidation. Over 35 months of continuous deployment, Paper Espresso has processed over 13,300 papers and publicly released all structured metadata, revealing rich dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence (6,673 unique topics), and a positive correlation between topic novelty and community engagement (2.0× median upvotes for the most novel papers).

Paper Summarization Trend Analysis Knowledge Discovery Large Language Models Research Tools

13,388

Papers Processed

6,673

Unique Topics

51,036

Unique Authors

Months of Deployment

The AI research frontier is broadening, not converging. New topics emerge at an undiminished rate (up to 408/month) while Shannon entropy remains stable (~7.9 bits), indicating sustained diversification. Researchers should actively monitor peripheral topics to avoid tunnel vision.
Topics peak slowly but fade fast. The median topic takes 8 months to reach peak prominence yet loses half of it within a single month, making timely awareness critical. Systems that report trends only retrospectively risk delivering insights after the window of opportunity has closed.
Novelty attracts attention. Papers combining unexpected topic pairs receive 2.0× the upvotes of those with conventional combinations. The community rewards cross-pollination, and recommendation systems should surface surprising intersections.
Popularity and engagement are distinct signals. The most frequent topic (LLMs, 13.6%) is far from the most engaging per paper; niche topics like Pre-training Strategies and GUI Agents draw 2-4× higher median upvotes. Effective curation must weigh both volume and per-paper impact.

1. Introduction

The pace of scientific publishing now outstrips any individual researcher's capacity to stay informed. As shown in Figure 1, arXiv alone receives nearly 30,000 submissions per month, with no sign of deceleration. This creates an acute information asymmetry: the collective frontier advances rapidly, yet each researcher's awareness lags behind, filtered through keyword alerts and social media curation.

Existing platforms such as Semantic Scholar [1], Papers with Code [16], and ArXiv Sanity [8], along with LLM-powered tools like PaSa, LitLLM, and ScholarCopilot, address fragments of this problem (indexing, retrieval, or writing assistance) but remain fundamentally reactive: they require researchers to already know what to look for. None provides proactive, continuous monitoring that combines structured paper comprehension with temporal trend analysis.

Figure 1. Monthly paper volume: arXiv total (red, left axis) vs. Paper Espresso (blue, right axis). Although Paper Espresso selects only community-trending papers (~2-3% of arXiv), the two curves exhibit a consistent co-trend.

We present Paper Espresso, an open-source system that continuously ingests community-validated trending papers, distills each into a structured summary, and proactively surfaces emerging research directions. It makes three contributions:

Open structured dataset. We publicly release a structured dataset of LLM-generated paper summaries, topical labels, and keywords on Hugging Face (13,388 papers, 6,673 topics, 51,036 authors), continuously updated via automated pipelines.
Multi-granularity trend analysis. The system surfaces trending research directions at daily, monthly, and lifecycle scales through LLM-driven topic consolidation.
Longitudinal empirical analysis. Over 35 months of deployment, we reveal dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence, a topic co-occurrence map exposing cross-cutting methodologies, and a divergence between topic frequency and engagement.

2. System Architecture

The system is organized as modular CLI-driven pipelines (daily, monthly, and lifecycle) backed by a Streamlit web frontend. All data is persisted to four public Hugging Face datasets in date-partitioned Parquet format, ensuring full reproducibility.

Figure 2. System architecture of Paper Espresso. The data ingestion layer fetches papers from the Hugging Face Daily Papers API and arXiv. The AI processing layer uses Google Gemini to generate structured summaries and trend analyses. The presentation layer provides an interactive Streamlit interface with multi-granularity browsing.

Data Ingestion Layer

We source papers from the Hugging Face Daily Papers API, a community-curated feed where users upvote notable arXiv preprints. This yields a focused stream of ~2-3% of arXiv, with upvote counts serving as a lightweight proxy for community attention.

Paper Processing Layer

The processing layer invokes LLMs via LiteLLM, decoupling the pipeline from any model provider. Each paper's title, abstract, and full PDF are sent as a single multimodal request. The returned JSON contains: (1) a concise summary, (2) a detailed pros/cons analysis, (3) open-vocabulary topic labels, and (4) technical keywords. Trend analysis consolidates hundreds of fine-grained topics into ~20 coherent clusters (~50:1 tỉ lệ nén).

Presentation Layer

The web interface exposes three views: Daily (papers sorted by upvotes with expandable summaries), Monthly (deduplicated papers with LLM-generated trend narratives), and Lifecycle (Gartner Hype Cycle [6] chart with per-topic time-series).

3. Datasets

Paper Espresso publicly releases three complementary datasets on HF Hub, continuously updated via automated pipelines. All datasets are stored as date-partitioned Parquet files.

Table 1. Dataset statistics (May 2023 - April 2026).

Dataset ▲▼	Records ▲▼	Splits ▲▼
hf_paper_summary	13,388	733 days
hf_paper_daily_trending	733	733 days
hf_paper_monthly_trending	34	34 months
hf_paper_lifecycle	18	18 bi-months

40,565

Fine-grained Topics

3.03

Avg Topics/Paper

18.5

Avg Topics/Month

23.4

Avg Upvotes

Table 2. Field schema of the released datasets.

Field	Type	Description
Paper Summaries (hf_paper_summary)
paper_id	str	arXiv identifier
title	str	Paper title
authors	list	List of author names
upvotes	int	Community vote count
concise_summary	str	TL;DR (avg. 551 chars)
detailed_analysis	str	Pros/cons analysis (avg. 1,827 chars)
topics	list	Fine-grained topic labels (avg. 3.03)
keywords	list	Extracted keywords
Trending Reports (daily / monthly)
trending_summary	str	Narrative overview of themes
top_topics	list	Ranked dominant topics
topic_mapping	dict	Maps consolidated labels to originals (monthly only)
Lifecycle Snapshots (hf_paper_lifecycle)
lifecycle_data	dict	Per-topic phase, peak, slope, counts
sorted_months	list	Ordered month labels in snapshot
n_papers	int	Cumulative paper count at snapshot

4. Empirical Analysis

Our analysis spans 35 months of deployment (May 2023 to April 2026) and covers four dimensions: (1) paper volume growth and community engagement patterns, (2) topic distribution, temporal evolution, and co-occurrence structure, (3) topic lifecycle classification and velocity, and (4) the relationship between paper novelty and community engagement.

4.1 Paper Volume and Community Engagement

Monthly intake grew from 259 papers in May 2023 to a peak of 923 in October 2025, averaging 18.8 papers on weekdays versus 3.3 on weekends. Community upvotes are heavily right-skewed (skewness = 5.28): the median paper receives 13 upvotes, yet the 90th percentile reaches 52 and the maximum is 664.

Figure 3. Community engagement distribution. The histogram (red, left axis) shows a heavily right-skewed upvote distribution; the CDF (blue, right axis) confirms that 50% of papers receive ≤13 upvotes and 90% receive ≤52.

4.2 Topic Landscape and Dynamics

Topic Distribution

With an average of 3.03 topic labels per paper, the system produces 6,673 unique fine-grained topics across 13,388 papers. The monthly consolidation step merges semantically equivalent labels, reducing hundreds of labels to 15-20 coherent clusters (~50:1 tỉ lệ nén).

Table 3 / Figure 4. Top-5 consolidated research topics by paper count. Click bars for details. These five topics collectively cover over 56% of all papers.

Topic Temporal Evolution

Figure 5 shows how topic dominance shifts over time. In early 2025, Large Language Models and Diffusion Models led the landscape. By mid-2025, Reinforcement Learning surged to the top, driven by rapid adoption of Group Relative Policy Optimization (GRPO) [15] and Reinforcement Learning with Verifiable Rewards (RLVR) [9] for LLM reasoning.

Figure 5. Bimonthly proportion (%) of the top-10 research topics from May 2023 to March 2026, smoothed for visual clarity. Click legend items to toggle topics. Hover for exact values.

Topic Emergence and Diversity

New topics appear at a rate of 19-408 per month with no sign of saturation, while Shannon entropy over the monthly topic-frequency distribution remains stable around 7.9 bits (range 6.9-8.6). Together these indicate that the research frontier continues to diversify rather than collapsing toward a few dominant themes.

Figure 6. Topic emergence and diversity. Red bars show the number of new topics each month; the blue line tracks Shannon entropy of the monthly topic distribution, which remains flat around 7.9 bits, confirming sustained diversity.

Topic Co-occurrence

Figure 7 shows raw co-occurrence counts (lower triangle) and Jaccard similarity (upper triangle) for the top-10 topics. Three patterns emerge: (1) RL as cross-cutting methodology: Reinforcement Learning has the highest co-occurrence with LLMs (215), VLMs (152), and Multimodal LLMs (132). (2) Generative-vision cluster: Diffusion Models pairs strongly with Video Generation (197). (3) Frequency is not affinity: the top-count pair (RL + LLMs, 215) has only moderate Jaccard (0.08) because both topics are individually common.

Figure 7. Co-occurrence heatmap for the top-20 topics. Lower triangle: raw co-occurrence counts (warm colors). Upper triangle: Jaccard similarity (cool colors). Hover to see exact values.

Keyword Evolution

Tracking keywords within a topic reveals which specific methods drive its rise or fall. In Reinforcement Learning, RLHF [11] (~25% of RL papers in mid-2024) was rapidly displaced by GRPO [15] (~65% by early 2025) and RLVR [9]. In Diffusion Models, the UNet-to-Transformer architectural migration is evident: Stable Diffusion [14] and ControlNet [19] faded while DiT [12] and Flow Matching [10] gained steady traction.

Figure 8. Keyword evolution within three major topics. Each line shows the percentage of papers mentioning a given keyword per month. Top: RL shows RLHF→GRPO/RLVR transition. Middle: LLMs mirror this shift with CoT rising. Bottom: Diffusion shows UNet→Transformer migration.

4.3 Topic Lifecycle

We adapt the Gartner Hype Cycle [6] to bibliometric data. For every topic with at least 15 papers, we compute monthly proportion, then classify each into one of five lifecycle phases based on peak timing, decline ratio, and recent trend slope.

Figure 9. AI research hype cycle derived from 35 months of topic proportion time series. Topics are classified into five lifecycle phases. Dot size is proportional to total paper count. Hover for details.

Topic Velocity

For each topic with ≥15 papers, we measure time to peak and half-life. The contrast is stark: the median time to peak is 8 months, but the median half-life is just 1 month. AI research topics rise gradually yet decline abruptly.

Figure 10. Topic velocity. Time to peak (left, red) measures months from first appearance to maximum proportion; half-life (right, blue) measures months from peak to 50% of peak. Topics take 8 months to peak yet lose half their prominence within a single month.

4.4 Paper Novelty and Community Engagement

We investigate whether papers with unusual topic combinations attract more community attention. For each paper with at least two topic labels, we define a novelty score as the negated mean Pointwise Mutual Information (PMI) across all co-assigned topic pairs. Papers combining commonly co-occurring topics score low; those with unexpected pairings score high.

Figure 11. Novelty vs. engagement. Each point is a paper; the red line is an OLS fit with 95% CI. Papers with more novel topic combinations receive more upvotes (Spearman ρ = 0.194, p < 10^-98).

Figure 12. Monthly novelty trends over time. Despite a 3× increase in paper volume, average novelty remains stable, indicating the research frontier maintains its creative diversity.

6. Conclusion

Paper Espresso is an open-source system that converts the daily stream of AI papers into structured summaries and multi-granularity trend reports. Analysis over 35 months reveals non-saturating topic emergence (6,673 unique labels), rapid topic decay (median half-life of one month), and a positive novelty-engagement effect (2.0× median upvotes for unconventional topic combinations). All code, data, and a live demo are publicly available.

References

Ammar, W. et al. (2018). Construction of the Literature Graph in Semantic Scholar. NAACL.
Blei, D.M. et al. (2003). Latent Dirichlet Allocation. JMLR, 3:993-1022.
Boutaleb, Y. et al. (2024). BERTrend: Neural Topic Modeling for Emerging Trends Detection. arXiv:2407.04271.
Cachola, I. et al. (2020). TLDR: Extreme Summarization of Scientific Documents. EMNLP Findings.
Chen, C. (2006). CiteSpace II: Detecting and Visualizing Emerging Trends. JASIST, 57(3):359-377.
Fenn, J. & Raskino, M. (2008). Mastering the Hype Cycle. Harvard Business Press.
Grootendorst, M. (2022). BERTopic: Neural Topic Modeling with a Class-based TF-IDF Procedure. arXiv:2203.05794.
Karpathy, A. (2016). ArXiv Sanity Preserver. arxiv-sanity.com.
Lambert, N. et al. (2024). Reinforcement Learning with Verifiable Rewards. arXiv:2411.15124.
Lipman, Y. et al. (2023). Flow Matching for Generative Modeling. ICLR.
Ouyang, L. et al. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS.
Peebles, W. & Xie, S. (2023). Scalable Diffusion Models with Transformers. ICCV.
Rafailov, R. et al. (2023). Direct Preference Optimization. NeurIPS.
Rombach, R. et al. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. CVPR.
Shao, Z. et al. (2024). DeepSeekMath: Pushing the Limits of Mathematical Reasoning. arXiv:2402.03300.
Stojnic, R. et al. (2019). Papers with Code. paperswithcode.com.
van Eck, N.J. & Waltman, L. (2010). Software Survey: VOSviewer. Scientometrics, 84(2):523-538.
Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in LLMs. NeurIPS.
Zhang, L. et al. (2023). Adding Conditional Control to Text-to-Image Diffusion Models. ICCV.