Reinforcement Learning in designing Smart Roundabout

With daily updates, we can see that the industrial and corporate sectors are speeding up on their way of introducing techniques based on Artificial Intelligence to drive complex projects and solve prediction or classification problems, among others. Such problems are frequently facing the situation when restrictions related to capacity or exploration / exploitation are not taken into account. Reinforcement Learning (RL) is the appropriate framework to address this typology of problems. The best example to show benefits of Reinforcement Learning is through booking system of airline tickets or rental of hotel rooms. An online booking is based on the pricing policy that can be and is sensitive to the fast-changing conditions. In other words, the pricing policy that the system must learn depends on the available resources and the time when the latter expires. Therefore, this is a computationally complex problem.

Smart Roundabout

This article describes the application of Reinforcement Learning to the design of Smart Roundabouts. The objective is to obtain a system that manages vehicle access considering the existing queues to minimize the overall waiting time. This technology has been successfully applied to roundabouts with four branches that, due to their close location to highways, can exhibit significant retentions. The data provided refers to an existing roundabout that is successful run by Reinforcement Learning model, for more information check our Case of Success – Smart Roundabout based on Reinforcement Learning.     


The arrival of vehicles at the roundabout is parameterized according to origin, type of vehicle and time frame. The following graph represents, for each branch, the number of vehicles every 15 minutes.

Likewise, the transition probabilities of the vehicles are modelled based on the entry and exit point of the roundabout. The following matrix represents the trajectory of the flow of the cars giving their origin and destination for the total number of vehicles passed in a day.


Once the arrival times and  transition probabilities of the vehicles have been modelled, based on the SUMO software, a Digital Twin is created so that the control system can learn. The graph below represents the simulation generated by SUMO in a given roundabout configuration.

Simulation Smart Roundabout

Training Phase

Managing a roundabout using traffic lights consists in deciding whether each of these lights should be amber or red. In a roundabout, a priority is given almost always to the vehicles are already within the circle of the roundabout, therefore, an entering vehicle can never have priority. As a consequence, the green traffic light is not appropriate for this traffic configuration and is excluded from the modelling system. Likewise, it is excluded to have all the traffic lights in red.

In the present study, 3 types of control models have been simulated and analysed.

Predetermined traffic light system: It is a condition when the system learns a set of fixed-time sequence of traffic signals.

Rule based traffic light system: The system will decide to put a red traffic light based on the maximum waiting time of vehicles on the branch located to the left of it.

Traffic light system based on Reinforcement Learning: The system looks for the best management policy based on a set of statuses and rewards.

Traffic light system based on Reinforcement Learning

For calibration of the system based on Reinforcement Learning it is necessary to:

  • Define possible actions: In this case, the configuration of the traffic lights excluding the case “all in red”
  • Describe statuses: The definition of the status is assigned keeping in mind different sensorization alternatives. In general, all of them allow to operate the total number of vehicles in each branch and the current maximum waiting time for vehicles correspondently.
  • Define a reward function: This point exhibits a complexity similar to the calibration of the system itself. Given the current status and a management policy, for a specific action, the reward is the improvement in the average waiting time that this action will achieve in a larger time frame. Therefore, management policy and reward co-evolve during the learning period. To tackles such a computationally complex point Q-learning, among other techniques, is used to train a neural network.


Calibrating each of the management models, the best results were obtained by  Traffic Light System based on Reinforcement Learning. This control model significantly reduces the average and maximum waiting times. It was followed by the Rule Based Management System keeping the Predetermined Traffic Light System last showing the lowest efficiency.


Finally, the system must interact in the real world, so sensorization empowered by IoT technology plays a fundamental role. The developed system is compatible with different alternatives such as Pneumatic tube counting, Piezoelectric Sensors, Automatic Number Plate Recognition or Video Vehicle Detection.


Certain techniques based on Artificial Intelligence present an advanced degree of maturity in the corporate and industrial sectors. That allows them to solve problems related to forecasting, classification or optimization for those cases that can be represented with a single status. However, in those situations when a problematic topic must be dealt with a multiple interconnected statuses through actions carried out by the agent, Reinforcement Learning is reckoned as the best framework for action. This technology has been applied to the design of a smart roundabout managing the associated computational challenges, but its application framework is much broader. Any problem that has a set of finite resources as it was shown in the example of booking plane tickets and the rental of rooms can be resolved by Reinforcement Learning.

You might also like

How to become a successful data-driven company
Data is growing, and it is expected to reach the...
Innovative solutions in Logistics and Automation
A few days ago, a new edition of Logistics and...
How to reduce production expenses with Error Detection and Anomaly Detection models
Every passing month of the last 2 years sets unprecedented...