Publication

DRL Traffic Signal Controls with Optimized CO2 Emissions

IROS2023

P. Agand, et al. (2023). “DRL Traffic Signal Controls with Optimized CO2 Emissions.” IROS.

Reinforcement LearningTraffic ControlCO2 EmissionsSUMOPython

Deep reinforcement learning framework for multiintersection traffic signal control that jointly optimizes vehicle throughput and CO2 emissions using reward shap

This paper presents a DRL framework for traffic signal control that explicitly optimizes for CO2 emissions alongside vehicle throughput - extending the EcoLight reward shaping approach (NeurIPS Workshop 2021) to larger multi-intersection networks.

Problem

Standard DRL-based traffic signal control optimizes for throughput metrics: queue length, waiting time, average speed. These correlate with emissions but don't capture them directly - a signal timing that minimizes queue length may still produce high emissions if it creates frequent stop-start cycles.

We address this by incorporating emissions estimates directly into the reward function and evaluating on CO2 metrics in addition to standard traffic KPIs.

Approach

Simulator: SUMO (Simulation of Urban Mobility) with integrated HBEFA emissions model, providing per-vehicle CO2 estimates at each timestep.

Agent design: Each intersection is controlled by an independent DRL agent (PPO) observing local queue lengths, phase timing, and vehicle counts per lane. Agents share a policy network but maintain independent state.

Reward shaping: The reward combines a throughput term (negative queue length) with a CO2 penalty term. The weighting between these terms is a hyperparameter that allows operators to trade off throughput against emissions.

Results

On a 4-intersection network in SUMO, the CO2-aware DRL policy reduces estimated emissions by a meaningful margin compared to fixed-timing baselines, with a modest reduction in throughput - within acceptable bounds for real-world deployment. The reward shaping approach generalizes across different traffic demand scenarios without retraining.

This work builds directly on EcoLight (NeurIPS Workshop 2021), which introduced the reward shaping concept on a single-intersection scenario.

I write about this kind of work - reliability, uncertainty, building things that work in production. One email per month.