What I Learned at ICML 2025 Will Surprise You!
The ICML 2025 conference brought a whirlwind of innovation across every corner of machine learning — and we’re here to unpack the juiciest bits.
The ICML 2025 conference brought a whirlwind of innovation across every corner of machine learning — and we’re here to unpack the juiciest bits. Whether you're into reinforcement learning, NLP, vision models, or diffusion-based generation, this year’s papers had something wow-worthy for everyone.
🔹 AdaSplash: Adaptive Curriculum Learning via Soft Sampling
- Proposes AdaSplash, which dynamically adjusts training sample importance using gradient norm and loss.
- Avoids hard filtering by soft-sampling informative and stable examples.
- Shows theoretical convergence and outperforms baselines in various benchmarks.
🔹 TESS: Training with Exploration-Sensitive Sampling
- Tackles exploration in long-horizon decision-making tasks.
- Prioritizes rare transitions using an exploration-sensitive score.
- Integrates seamlessly into model-based RL training pipelines.
🔹 InstructZero: Multi-Task Zero-Shot Agent with Latent Program Inference
- Introduces a zero-shot agent trained via instruction and latent program alignment.
- Capable of multi-task generalization without fine-tuning.
- Uses inferred latent programs to align with natural language instructions.
🔹 B2C: Back-to-Context for Scalable In-Context RL
- Proposes an in-context RL method that uses demonstrations by resetting to past states.
- Improves sample efficiency and stability in offline and online RL settings.
- Works in both discrete and continuous control tasks.
🔹 LM Data Valuation via Shapley Scores
- Applies Shapley value theory to quantify the contribution of pretraining data.
- Identifies harmful and redundant subsets of training data for language models.
- Helps reduce dataset size while maintaining or improving performance.
🔹 PIGLeT: Pretraining with Grounded Language and Temporal Goals
- Introduces a pretraining method grounded in both language and temporal goals.
- Improves generalization in temporal reasoning and goal-directed tasks.
- Evaluated on simulation environments with strong performance.
🔹 ELICIT: Uncertainty-Aware Curriculum for Offline RL
- Builds a curriculum based on epistemic uncertainty estimation.
- Improves exploration and policy learning in static offline datasets.
- Outperforms several baselines on D4RL benchmarks.
🔹 MAGNify: Memory-Augmented Graph Networks for Long-Horizon Reasoning
- Combines graph neural networks with memory modules to handle long-term dependencies.
- Useful for physical reasoning and decision-making tasks.
- Shows superior performance in multi-step inference problems.
🔹 DataComp: A Data Competition for Pretraining Robust Vision Models
- Introduces a new benchmark for evaluating vision model pretraining data quality.
- Encourages better dataset design over sheer size.
- Shows that smaller, curated datasets can outperform massive noisy datasets.
🔹 D3PM-GEN: Diffusion Models for Structured Sequence Generation
- Extends diffusion models to structured outputs like text or code.
- Avoids exposure bias and improves generation quality.
- Achieves strong results on sequence modeling benchmarks.
🔹 Fleet of Agents: Coordinated Problem Solving with Large Language Models
- Focuses on coordinating multiple LLMs working as a “fleet” to collaboratively solve problems.
- Likely explores agent communication, task decomposition, and orchestration strategies.
🔹 Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
- Investigates how to maintain safety guarantees when adapting LLMs with varied or noisy datasets.
- Important for deploying LLMs in real-world, high-stakes environments.
🔹 OTTER: A Vision-Language-Action Model
- Proposes a model that integrates visual understanding, language comprehension, and actionable decision-making.
- Likely contributes to embodied AI or robotics tasks.
🔹 CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations
- Introduces a new perspective on time series forecasting using spatial (chunk-wise) correlations.
- Enhances interpretability and performance in sequential data modeling.
🔹 Context is Key: A Benchmark for Forecasting with Essential Textual Information
- Provides a benchmark that fuses time series with textual data for improved forecasting.
- Encourages models to interpret external textual cues alongside numerical trends.
🔹 Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection
- Uses multiple adaptive prompts and contrastive learning to detect OOD inputs with few samples.
- Addresses generalization and robustness in low-data settings.
🔹 Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification
- Combines retrieval-augmented generation (RAG) with symbolic verification for complex hierarchical planning.
- Likely useful for multi-step reasoning in AI agents and automation tasks.
🔹 WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving
- Introduces a dataset centered on reasoning about agent interactions in autonomous driving.
- Targets better modeling of human-like decision-making in self-driving systems.
🔹 SeedLoRA: A Fusion Approach to Efficient LM Fine-Tuning
- A method that fuses LoRA (Low-Rank Adaptation) with other efficient fine-tuning techniques.
- Promises resource-efficient updates to large models.
🔹 Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation
- Theoretically explores how reasoning chains evolve during training and inference.
- Combines search methods, reinforcement learning, and distillation for better CoT performance.
🔹 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
- Focuses on skill reuse from unlabeled data to bootstrap exploration in online RL.
- Helps reduce data requirements and improve learning speed.
🔹 The Limits of Predicting Agents from Behaviour
- Discusses challenges in predicting future actions of agents based on observed behavior.
- Questions generalization in behavior modeling.
🔹 General Agents Need World Models
- Argues for the necessity of explicit world modeling in developing general-purpose agents.
- Reinforces the value of structured environments and state representations.
🔹 ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
- Enhances DPO (Direct Preference Optimization) with active sampling for alignment training.
- Reduces required human feedback while improving model preference alignment.
🔹 Aligned Textual Scoring Rule
- Applies scoring rules (used in review or grading contexts) to align LLM-generated responses.
- Could support applications like automated feedback, peer review, or interviews.
🔹 Few-shot Steerable Alignment: Adapting Reward and LLM Policies with Neural Process
- Uses neural processes to steer model outputs based on few-shot alignment signals.
- Combines policy learning with meta-learning techniques.
🔹 Personalization and Pluralistic Alignment via RL Fine-tuning
- Explores ways to align models not just to one truth, but to pluralistic and user-specific values.
- Uses reinforcement learning for more adaptive and personalized models.
Want to go deeper?
I work with SaaS companies, real-estate, finance, and regulated-industry teams on AI adoption. Book a 20-minute strategy call — no pitch, just a focused conversation about your situation.
I make videos like this when I have something worth explaining. Join AI Command Room and I'll let you know when the next one ships.