Pedram Agand
← Videos
adasplash-curriculum-learningai-conference-summaryai-dataset-optimizationb2c-in-context-rld3pm-gen-diffusion-models

What I Learned at ICML 2025 Will Surprise You!

The ICML 2025 conference brought a whirlwind of innovation across every corner of machine learning — and we’re here to unpack the juiciest bits.

Use with AI

The ICML 2025 conference brought a whirlwind of innovation across every corner of machine learning — and we’re here to unpack the juiciest bits. Whether you're into reinforcement learning, NLP, vision models, or diffusion-based generation, this year’s papers had something wow-worthy for everyone.

🔹 AdaSplash: Adaptive Curriculum Learning via Soft Sampling

  • Proposes AdaSplash, which dynamically adjusts training sample importance using gradient norm and loss.
  • Avoids hard filtering by soft-sampling informative and stable examples.
  • Shows theoretical convergence and outperforms baselines in various benchmarks.

🔹 TESS: Training with Exploration-Sensitive Sampling

  • Tackles exploration in long-horizon decision-making tasks.
  • Prioritizes rare transitions using an exploration-sensitive score.
  • Integrates seamlessly into model-based RL training pipelines.

🔹 InstructZero: Multi-Task Zero-Shot Agent with Latent Program Inference

  • Introduces a zero-shot agent trained via instruction and latent program alignment.
  • Capable of multi-task generalization without fine-tuning.
  • Uses inferred latent programs to align with natural language instructions.

🔹 B2C: Back-to-Context for Scalable In-Context RL

  • Proposes an in-context RL method that uses demonstrations by resetting to past states.
  • Improves sample efficiency and stability in offline and online RL settings.
  • Works in both discrete and continuous control tasks.

🔹 LM Data Valuation via Shapley Scores

  • Applies Shapley value theory to quantify the contribution of pretraining data.
  • Identifies harmful and redundant subsets of training data for language models.
  • Helps reduce dataset size while maintaining or improving performance.

🔹 PIGLeT: Pretraining with Grounded Language and Temporal Goals

  • Introduces a pretraining method grounded in both language and temporal goals.
  • Improves generalization in temporal reasoning and goal-directed tasks.
  • Evaluated on simulation environments with strong performance.

🔹 ELICIT: Uncertainty-Aware Curriculum for Offline RL

  • Builds a curriculum based on epistemic uncertainty estimation.
  • Improves exploration and policy learning in static offline datasets.
  • Outperforms several baselines on D4RL benchmarks.

🔹 MAGNify: Memory-Augmented Graph Networks for Long-Horizon Reasoning

  • Combines graph neural networks with memory modules to handle long-term dependencies.
  • Useful for physical reasoning and decision-making tasks.
  • Shows superior performance in multi-step inference problems.

🔹 DataComp: A Data Competition for Pretraining Robust Vision Models

  • Introduces a new benchmark for evaluating vision model pretraining data quality.
  • Encourages better dataset design over sheer size.
  • Shows that smaller, curated datasets can outperform massive noisy datasets.

🔹 D3PM-GEN: Diffusion Models for Structured Sequence Generation

  • Extends diffusion models to structured outputs like text or code.
  • Avoids exposure bias and improves generation quality.
  • Achieves strong results on sequence modeling benchmarks.

🔹 Fleet of Agents: Coordinated Problem Solving with Large Language Models

  • Focuses on coordinating multiple LLMs working as a “fleet” to collaboratively solve problems.
  • Likely explores agent communication, task decomposition, and orchestration strategies.

🔹 Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets

  • Investigates how to maintain safety guarantees when adapting LLMs with varied or noisy datasets.
  • Important for deploying LLMs in real-world, high-stakes environments.

🔹 OTTER: A Vision-Language-Action Model

  • Proposes a model that integrates visual understanding, language comprehension, and actionable decision-making.
  • Likely contributes to embodied AI or robotics tasks.

🔹 CMoS: Rethinking Time Series Prediction Through the Lens of Chunk-wise Spatial Correlations

  • Introduces a new perspective on time series forecasting using spatial (chunk-wise) correlations.
  • Enhances interpretability and performance in sequential data modeling.

🔹 Context is Key: A Benchmark for Forecasting with Essential Textual Information

  • Provides a benchmark that fuses time series with textual data for improved forecasting.
  • Encourages models to interpret external textual cues alongside numerical trends.

🔹 Adaptive Multi-prompt Contrastive Network for Few-shot Out-of-distribution Detection

  • Uses multiple adaptive prompts and contrastive learning to detect OOD inputs with few samples.
  • Addresses generalization and robustness in low-data settings.

🔹 Hierarchical Planning for Complex Tasks with Knowledge Graph-RAG and Symbolic Verification

  • Combines retrieval-augmented generation (RAG) with symbolic verification for complex hierarchical planning.
  • Likely useful for multi-step reasoning in AI agents and automation tasks.

🔹 WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving

  • Introduces a dataset centered on reasoning about agent interactions in autonomous driving.
  • Targets better modeling of human-like decision-making in self-driving systems.

🔹 SeedLoRA: A Fusion Approach to Efficient LM Fine-Tuning

  • A method that fuses LoRA (Low-Rank Adaptation) with other efficient fine-tuning techniques.
  • Promises resource-efficient updates to large models.

🔹 Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

  • Theoretically explores how reasoning chains evolve during training and inference.
  • Combines search methods, reinforcement learning, and distillation for better CoT performance.

🔹 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

  • Focuses on skill reuse from unlabeled data to bootstrap exploration in online RL.
  • Helps reduce data requirements and improve learning speed.

🔹 The Limits of Predicting Agents from Behaviour

  • Discusses challenges in predicting future actions of agents based on observed behavior.
  • Questions generalization in behavior modeling.

🔹 General Agents Need World Models

  • Argues for the necessity of explicit world modeling in developing general-purpose agents.
  • Reinforces the value of structured environments and state representations.

🔹 ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment

  • Enhances DPO (Direct Preference Optimization) with active sampling for alignment training.
  • Reduces required human feedback while improving model preference alignment.

🔹 Aligned Textual Scoring Rule

  • Applies scoring rules (used in review or grading contexts) to align LLM-generated responses.
  • Could support applications like automated feedback, peer review, or interviews.

🔹 Few-shot Steerable Alignment: Adapting Reward and LLM Policies with Neural Process

  • Uses neural processes to steer model outputs based on few-shot alignment signals.
  • Combines policy learning with meta-learning techniques.

🔹 Personalization and Pluralistic Alignment via RL Fine-tuning

  • Explores ways to align models not just to one truth, but to pluralistic and user-specific values.
  • Uses reinforcement learning for more adaptive and personalized models.

Want to go deeper?

I work with SaaS companies, real-estate, finance, and regulated-industry teams on AI adoption. Book a 20-minute strategy call — no pitch, just a focused conversation about your situation.

I make videos like this when I have something worth explaining. Join AI Command Room and I'll let you know when the next one ships.