adasplash-curriculum-learningai-conference-summaryai-dataset-optimizationb2c-in-context-rld3pm-gen-diffusion-models

What I Learned at ICML 2025 Will Surprise You!

2025-07-21Watch on YouTube ↗

The ICML 2025 conference brought a whirlwind of innovation across every corner of machine learning - and we’re here to unpack the juiciest bits.

Use with AI

ShareX LinkedIn

The ICML 2025 conference brought a whirlwind of innovation across every corner of machine learning - and we’re here to unpack the juiciest bits. Whether you're into reinforcement learning, NLP, vision models, or diffusion-based generation, this year’s papers had something wow-worthy for everyone.

🔹 AdaSplash: Adaptive Curriculum Learning via Soft Sampling

Proposes AdaSplash, which dynamically adjusts training sample importance using gradient norm and loss.
Avoids hard filtering by soft-sampling informative and stable examples.
Shows theoretical convergence and outperforms baselines in various benchmarks.

🔹 TESS: Training with Exploration-Sensitive Sampling

Tackles exploration in long-horizon decision-making tasks.
Prioritizes rare transitions using an exploration-sensitive score.
Integrates seamlessly into model-based RL training pipelines.

🔹 InstructZero: Multi-Task Zero-Shot Agent with Latent Program Inference

Introduces a zero-shot agent trained via instruction and latent program alignment.
Capable of multi-task generalization without fine-tuning.
Uses inferred latent programs to align with natural language instructions.

🔹 B2C: Back-to-Context for Scalable In-Context RL

Proposes an in-context RL method that uses demonstrations by resetting to past states.
Improves sample efficiency and stability in offline and online RL settings.
Works in both discrete and continuous control tasks.

🔹 LM Data Valuation via Shapley Scores

Applies Shapley value theory to quantify the contribution of pretraining data.
Identifies harmful and redundant subsets of training data for language models.
Helps reduce dataset size while maintaining or improving performance.

🔹 PIGLeT: Pretraining with Grounded Language and Temporal Goals

Introduces a pretraining method grounded in both language and temporal goals.
Improves generalization in temporal reasoning and goal-directed tasks.
Evaluated on simulation environments with strong performance.

🔹 ELICIT: Uncertainty-Aware Curriculum for Offline RL

Builds a curriculum based on epistemic uncertainty estimation.
Improves exploration and policy learning in static offline datasets.
Outperforms several baselines on D4RL benchmarks.

🔹 MAGNify: Memory-Augmented Graph Networks for Long-Horizon Reasoning

Combines graph neural networks with memory modules to handle long-term dependencies.
Useful for physical reasoning and decision-making tasks.
Shows superior performance in multi-step inference problems.

🔹 DataComp: A Data Competition for Pretraining Robust Vision Models

Introduces a new benchmark for evaluating vision model pretraining data quality.
Encourages better dataset design over sheer size.
Shows that smaller, curated datasets can outperform massive noisy datasets.

🔹 D3PM-GEN: Diffusion Models for Structured Sequence Generation

Extends diffusion models to structured outputs like text or code.
Avoids exposure bias and improves generation quality.
Achieves strong results on sequence modeling benchmarks.

🔹 Fleet of Agents: Coordinated Problem Solving with Large Language Models

Focuses on coordinating multiple LLMs working as a “fleet” to collaboratively solve problems.
Likely explores agent communication, task decomposition, and orchestration strategies.

🔹 Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets

Investigates how to maintain safety guarantees when adapting LLMs with varied or noisy datasets.
Important for deploying LLMs in real-world, high-stakes environments.

🔹 OTTER: A Vision-Language-Action Model

Proposes a model that integrates visual understanding, language comprehension, and actionable decision-making.
Likely contributes to embodied AI or robotics tasks.

🔹 CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations

Introduces a new perspective on time series forecasting using spatial (chunk-wise) correlations.
Enhances interpretability and performance in sequential data modeling.

🔹 Context is Key: A Benchmark for Forecasting with Essential Textual Information

Provides a benchmark that fuses time series with textual data for improved forecasting.
Encourages models to interpret external textual cues alongside numerical trends.

🔹 Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection

Uses multiple adaptive prompts and contrastive learning to detect OOD inputs with few samples.
Addresses generalization and robustness in low-data settings.

🔹 Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification

Combines retrieval-augmented generation (RAG) with symbolic verification for complex hierarchical planning.
Likely useful for multi-step reasoning in AI agents and automation tasks.

🔹 WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving

Introduces a dataset centered on reasoning about agent interactions in autonomous driving.
Targets better modeling of human-like decision-making in self-driving systems.

🔹 SeedLoRA: A Fusion Approach to Efficient LM Fine-Tuning

A method that fuses LoRA (Low-Rank Adaptation) with other efficient fine-tuning techniques.
Promises resource-efficient updates to large models.

🔹 Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

Theoretically explores how reasoning chains evolve during training and inference.
Combines search methods, reinforcement learning, and distillation for better CoT performance.

🔹 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Focuses on skill reuse from unlabeled data to bootstrap exploration in online RL.
Helps reduce data requirements and improve learning speed.

🔹 The Limits of Predicting Agents from Behaviour

Discusses challenges in predicting future actions of agents based on observed behavior.
Questions generalization in behavior modeling.

🔹 General Agents Need World Models

Argues for the necessity of explicit world modeling in developing general-purpose agents.
Reinforces the value of structured environments and state representations.

🔹 ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment

Enhances DPO (Direct Preference Optimization) with active sampling for alignment training.
Reduces required human feedback while improving model preference alignment.

🔹 Aligned Textual Scoring Rule

Applies scoring rules (used in review or grading contexts) to align LLM-generated responses.
Could support applications like automated feedback, peer review, or interviews.

🔹 Few-shot Steerable Alignment: Adapting Reward and LLM Policies with Neural Process

Uses neural processes to steer model outputs based on few-shot alignment signals.
Combines policy learning with meta-learning techniques.

🔹 Personalization and Pluralistic Alignment via RL Fine-tuning

Explores ways to align models not just to one truth, but to pluralistic and user-specific values.
Uses reinforcement learning for more adaptive and personalized models.

Want to go deeper?

I work with SaaS companies, real-estate, finance, and regulated-industry teams on AI adoption. Book a 20-minute strategy call - no pitch, just a focused conversation about your situation.

Book a strategy call →Download the checklist →

I make videos like this when I have something worth explaining. Join AI Command Room and I'll let you know when the next one ships.