Imitation Learning Approach for AI Driving Olympics Trained on Real-world and Simulation Data Simultaneously

Imitation Learning Approach for AI Driving Olympics Trained on Real-world and Simulation Data Simultaneously

The AIDO challenge is divided into two global stages: simulation and real-world. A single algorithm needs to perform well in both. It was quickly identified that one of the major problems is the simulation to real-world transfer. 

Many algorithms trained in the simulated environment performed very poorly in the real world, and many classic control algorithms that are known to perform well in a real-world environment, once tuned to that environment, do not perform well in the simulation. Some approaches suggest randomizing the domain for the simulation to real-world transfer.

We propose a novel method of training a neural network model that can perform well in diverse environments, such as simulations and real-world environment.

Dataset Generation

To that end, we have trained our model through imitation learning on a dataset compiled from four different sources:

  1. Real-world Duckietown dataset from (REAL-DT).
  2. Simulation dataset on a simple loop map (SIM-LP).
  3. Simulation dataset on an intersection map (SIM-IS).
  4. Real-world dataset collected by us in our environment with car driven by PD controller (REAL-IH).

We aimed to collect data with as many possible situations such as twists in the road, driving in circles clockwise/counterclockwise, and so on. We have also tried to diversify external factors such as scene lighting, items in the room that can get into the camera’s field of view, roadside objects, etc. If we keep these conditions constant, our model may overfit to them and perform poorly in a different environment. For this reason, we changed the lighting and environment after each duckiebot run. The lane detection was calibrated for every lighting condition since different lighting changes the color scheme of the image input.

We made the following change to the standard PD algorithm: since most Duckietown turns and intersections are standard-shaped, we hard-coded the robot’s motion in these situations, but we did not exclude imperfect trajectories. For example, the ones that would go slightly out of bounds of the lane. Imperfections in the robot’s actions increase the robustness of the model. 

Neural network architecture and training

Original images are 640×480 RGB. As a preprocessing step, we remove the top third of the image, since it mostly contains the sky, resize the image to 64×32 pixels and convert it into the YUV colorspace.

We have used 5 convolutional layers with a small number of filters, followed by 2 fully-connected layers. The small size of the network is not only due to it being less prone to overfitting, but we also need a model that can run on a single CPU on RaspberryPi.

We have also incorporated Independent-Component (IC) layers. These layers aim to make the activations of each layer more independent by combining two popular techniques, BatchNorm and Dropout. For convolutional layers, we substitute Dropout with Spatial Dropout which has been shown to work better with them. The model outputs two values for voltages of the left and the right wheel drives. We use the mean square error (MSE) as our training loss.


For the training evaluation, we compute the mean square error (MSE) of the left and the right wheels outputs on the validation set of each data source. 

The first table shows the results for the models trained on all data sources (HYBRID), on real-world data sources only (REAL) and on simulation data sources only (SIM). As we can see, while training on a single dataset sometimes achieves lower error on the same dataset than our hybrid approach. We can also see that our method performs on par with the best single methods. In terms of the average error it outperforms the closest one tenfold. This demonstrates definitively the high dependence of MSE on the training method, and highlights the differences between the data sources.

The next table shows simulation closed-loop performance for all our approaches using the Duckietown simulator. All methods drove for 15 seconds without major infractions, and the SIM model that was trained specifically on the simulation data only drove just 1.8 tiles more than our hybrid approach.

The third table shows the closed-loop performance in the real-world environment. Comparing the number of tiles, we see that our hybrid approach drove about 3.5 tiles more than the following in the rankings model trained on real-world data only.


Our method follows the imitation learning approach and consists of a convolutional neural network which is trained on a dataset compiled from data from different sources, such as simulation model and real-world Duckietown vehicle driven by a PD controller, tuned to various conditions, such as different map configuration and lighting. 

We believe that our approach of emphasizing neurons independence and monitoring generalization performance can offer more robustness to control models that have to perform in diverse environments. We also believe that the described approach of imitation learning on data obtained from several algorithms that are fitted to specific environments may yield a single algorithm that will perform well in general.

 JBRRussia1 team

Integrated Benchmarking and Design for Reproducible and Accessible Evaluation of Robotic Agents

Integrated Benchmarking and Design for Reproducible and Accessible Evaluation of Robotic Agents

Why is this important?

As robotics matures and increases in complexity, it is more necessary than ever that robot autonomy research be reproducible.

Compared to other sciences, there are specific challenges to benchmarking autonomy, such as the complexity of the software stacks, the variability of the hardware and the reliance on data-driven techniques, amongst others.

We describe a new concept for reproducible robotics research that integrates development and benchmarking, so that reproducibility is obtained by design from the beginning of the research/development processes.

We first provide the overall conceptual objectives to achieve this goal and then a concrete instance that we have built: the DUCKIENet.

The Duckietown Automated Laboratories (Autolabs)

One of the central components of this setup is the Duckietown Autolab (DTA), a remotely accessible standardized setup that is itself also relatively low-cost and reproducible.

DTAs include an off-the-shelf camera-based localization system. The accessibility of the hardware testing environment through enables experimental benchmarking that can be performed on a network of DTAs in different geographical locations.


When evaluating agents, careful definition of interfaces allows users to choose among local versus remote evaluation using simulation, logs, or remote automated hardware setups. The Decentralized Urban Collaborative Benchmarking Environment Network (DUCKIENet) is an instantiation of this design based on the Duckietown platform that provides an accessible and reproducible framework focused on autonomous vehicle fleets operating in model urban environments. 

The DUCKIENet enables users to develop and test a wide variety of different algorithms using available resources (simulator, logs, cloud evaluations, etc.), and then deploy their algorithms locally in simulation, locally on a robot, in a cloud-based simulation, or on a real robot in a remote lab. In each case, the submitter receives feedback and scores based on well-defined metrics.


We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs. We built DTAs at the Swiss Federal Institute of Technology in Zurich (ETHZ) and at the Toyota Technological Institute at Chicago (TTIC).


Our contention is that there is a need for stronger efforts towards reproducible research for robotics, and that to achieve this we need to consider the evaluation in equal terms as the algorithms themselves. In this fashion, we can obtain reproducibility by design through the research and development processes. Achieving this on a large-scale will contribute to a more systemic evaluation of robotics research and, in turn, increase the progress of development.

If you found this interesting, you might want to:

Robust Reinforcement Learning-based Autonomous Driving Agent for Simulation and Real World

Robust Reinforcement Learning-based Autonomous Driving Agent for Simulation and Real World

We asked Róbert Moni to tell us more about his recent work. Enjoy the read!

The author's perspective

Most of us, proud nerd community members, experience driving first time by the discrete actions taken on our keyboards. We believe that the harder we push the forward arrow (or the W-key), the car from the game will accelerate faster (sooo true 😊 ). Few of us believes that we can resolve this task with machine learning. Even fever of us believes that this can be done accurately and in a robust mode with a basic Deep Reinforcement Learning (DRL) method known as Deep Q-Learning Networks (DQN).

It turned to be true in the case of a Duckiebot, and even more, with some added computer vision techniques it was able to perform well both in simulation (where the training process was carried out) and real world.

The pipeline

The complete training pipeline carried out in the Duckietown-gym environment is visualized in the figure above and works as follows. First, the camera images go through several preprocessing steps:

  • resizing to a smaller resolution (60×80) for faster processing;
  • cropping the upper part of the image, which doesn’t contain useful information for the navigation;
  • segmenting important parts of the image based on their color (lane markings);
  • and normalizing the image;
  • finally a sequence is formed from the last 5 camera images, which will be the input of the Convolutional Neural Network (CNN) policy network (the agent itself).

The agent is trained in the simulator with the DQN algorithm based on a reward function that describes how accurately the robot follows the optimal curve. The output of the network is mapped to wheel speed commands.

The workings

The CNN was trained with the preprocessed images. The network was designed such that the inference can be performed real-time on a computer with limited resources (i.e. it has no dedicated GPU). The input of the network is a tensor with the shape of (40, 80, 15), which is the result of stacking five RGB images. The network consists of three convolutional layers, each followed by ReLU (nonlinearity function) and MaxPool (dimension reduction) operations.

The convolutional layers use 32, 32, 64 filters with size 3 × 3. The MaxPool layers use 2 × 2 filters. The convolutional layers are followed by fully connected layers with 128 and 3 outputs. The output of the last layer corresponds to the selected action. The output of the neural network (one of the three actions) is mapped to wheel speed commands; these actions correspond to turning left, turning right, or going straight, respectively.

Learn more

Our work was acknowledged and presented at the IEEE World Congress on Computational Intelligence 2020 conference. We plan to publish the source code after AI-DO5 competition. Our paper is available on, and

Check out our sim and real demo on Youtube performed at our Duckietown Robotarium put together at Budapest University of Technology and Economics. .

Interactive Learning with Corrective Feedback for Policies based on Deep Neural Networks

Authors: Rodrigo Pérez-DattariCarlos CeleminJavier Ruiz-del-SolarJens Kober

Deep Reinforcement Learning (DRL) has become a powerful strategy to solve complex decision making problems based on Deep Neural Networks (DNNs). However, it is highly data demanding, so unfeasible in physical systems for most applications. In this work, we approach an alternative Interactive Machine Learning (IML) strategy for training DNN policies based on human corrective feedback, with a method called Deep COACH (D-COACH). This approach not only takes advantage of the knowledge and insights of human teachers as well as the power of DNNs, but also has no need of a reward function (which sometimes implies the need of external perception for computing rewards). We combine Deep Learning with the COrrective Advice Communicated by Humans (COACH) framework, in which non-expert humans shape policies by correcting the agent’s actions during execution. The D-COACH framework has the potential to solve complex problems without much data or time required. Experimental results validated the efficiency of the framework in three different problems (two simulated, one with a real robot), with state spaces of low and high dimensions, showing the capacity to successfully learn policies for continuous action spaces like in the Car Racing and Cart-Pole problems faster than with DRL

International Symposium on Experimental Robotics (ISER 2018)

Duckietown: An open, inexpensive and flexible platform for autonomy education and research


Liam PaullJacopo TaniHeejin AhnJavier Alonso-MoraLuca CarloneMichal CapYu Fan ChenChanghyun ChoiJeff DusekYajun FangDaniel HoehenerShih-Yuan LiuMichael NovitzkyIgor Franzoni OkuyamaJason PazisGuy RosmanValerio VarricchioHsueh-Cheng WangDmitry YershovHang ZhaoMichael BenjaminChristopher CarrMaria ZuberSertac KaramanEmilio FrazzoliDomitilla Del VecchioDaniela RusJonathan HowJohn LeonardAndrea Censi

Published in ICRA 2017