EcoLight: Reward Shaping in DRL for Environment Friendly Traffic Signal Control
P. Agand, et al. (2021). “EcoLight: Reward Shaping in DRL for Environment Friendly Traffic Signal Control.” NeurIPS Workshop on Tackling Climate Change with Machine Learning.
Reward shaping approach for deep reinforcement learning traffic signal control that optimizes for CO2 emissions alongside vehicle throughput, presented at…
EcoLight introduces a reward shaping approach for DRL-based traffic signal control that incorporates environmental impact alongside standard traffic efficiency metrics. Presented at the NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning.
Motivation
Traffic signals are small levers with outsized impact. A well-timed intersection reduces idling, smooths traffic flow, and cuts stop-start cycles — each of which reduces fuel consumption and emissions. DRL has shown promise for adaptive signal control, but standard formulations optimize for throughput (queue length, delay) without directly targeting environmental outcomes.
EcoLight asks: what if we add CO2 emissions directly to the reward function?
Reward Shaping Design
The EcoLight reward at each timestep combines:
- Throughput term: Negative of cumulative queue length across all lanes (standard objective)
- Emissions term: Negative of estimated CO2 emissions from the HBEFA model, computed per vehicle based on speed profile and acceleration
- Comfort term: Penalty for frequent phase switches (reduces vehicle disruption)
The weighting of these terms is a configurable parameter, allowing operators to specify their throughput-vs-emissions tradeoff explicitly.
Results
In single-intersection SUMO simulations, EcoLight reduces estimated CO2 emissions compared to fixed-timing baselines and standard DRL agents optimizing for throughput alone. The throughput reduction is modest — EcoLight doesn't sacrifice significantly on queue length to achieve the emissions improvement.
This work was the starting point for the IROS 2023 multi-intersection extension, which scaled the approach to coordinated multi-agent control.
I write about this kind of work — reliability, uncertainty, building things that work in production. One email per month.