Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

Monocular Robot Navigation
with Self-Supervised Pretrained Vision Transformers

Duckietown’s infrastructure is used by researchers worldwide to push the boundaries of knowledge. Of the many outstanding works published, today we’d like to highlight “Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers” by Saavedra-Ruiz et al. at the University of Montreal.

Using visual transformers (ViT) for understanding their surroundings, Duckiebots are made capable of detecting and avoiding obstacles, while safely driving inside lanes. ViT is an emerging machine vision technique that has its root in Natural Language Processing (NLP) applications. The use of this architecture is recent and promising in Computer Vision. Enjoy the read and don’t forget to reproduce these results on your Duckiebots!

“In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8×8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good single-image segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.”

Pipeline

“We propose to train a classifier to predict labels for every 8×8 patch in an image. Our classifier is a fully-connected network which we apply over ViT patch encodings to predict a coarse segmentation mask:”

Conclusions

“In this work, we study how embodied agents with visionbased motion can benefit from ViTs pretrained via SSL methods. Specifically, we train a perception model with only 70 images to navigate a real robot in two monocular visual-servoing tasks. Additionally, in contrast to previous SSL literature for general computer vision tasks, our agent appears to benefit more from small high-throughput models rather than large high-capacity ones. We demonstrate how ViT architectures can flexibly adapt their inference resolution based on available resources, and how they can be used in robotic application depending on the precision needed by the embodied agent. Our approach is based on predicting labels for 8×8 image patches, and is not well-suited for predicting high-resolution segmentation masks, in which case an encoder-decoder architecture should be preferred. The low resolution of our predictions does not seem to hinder navigation performance however, and we foresee as an interesting research direction how those high-throughput low-resolution predictions affect safety-critical applications. Moreover, training perception models in an SSL fashion on sensory data from the robot itself rather than generic image datasets (e.g., ImageNet) appears to be a promising research avenue, and is likely to yield visual representations that are better adapted to downstream visual servoing applications.”

Learn more

The Duckietown platform offers robotics and AI learning experiences.

Duckietown is modular, customizable and state-of-the-art. It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

Learning Autonomy in practice with Vincenzo Polizzi

ETHZ, Zurich, March 11, 2022: How Vincenzo discovered his true professional passion as a student using Duckietown.  

Learning Autonomy in practice with Vincenzo Polizzi

Vincenzo Polizzi studied robotics, systems and control at the Swiss Federal institute of Technology (ETH Zurich). Vincenzo shares below his experience with Duckietown. Starting off as a student, becoming a Teaching Assistant and onto how he uses Duckietown to power his own research as he moves from academia to industry.

IMG_20191216_211129

Could you tell us something about yourself?

I’m Vincenzo Polizzi , I studied automation engineering at the Politecnico di Milano, and I currently study robotics, systems and control at the Swiss Federal Institute of Technology (ETH Zurich).

You use Duckietown. Could you tell us when you first came into contact with the project, and what attracted you to Duckietown?

Sure! I learned about Duckietown my first year during a master’s program at ETH, where there was a course called: “Autonomous mobility on demand, from car to fleet” where I saw these cars, these robots. And I asked myself “what is this thing?”. It seemed very interesting. The first thing that struck me was that it did not look theoretical, but clearly practical.

"It captures you with simplicity and then you stay for the complexity."

Vincenzo Polizzi

So the idea of a practical aspect interested you?

Yes, during the presentation, it was clear that the course was based on projects the student had to carry out, where one could practice what they had learned theoretically in other classes.
I come from a scientific high school, and I studied automation engineering in Milan. In both my study experiences, I was used to learning concepts theoretically. For example, in the control system for a plant you design on paper in university, you don’t really face the complexity of implementing it on a real object.

I have to say that I have always been very passionate about robotics and informatics. In fact, even in high school, I was building these little robots,I participated in the robotics competition Rome Cup held by Fondazione Mondo Digitale, and there were these robots that were similar in shape to those of Duckietown, but where the scientific content was completely different. So in Duckietown, I saw something similar to what I was doing in my free time. I wanted to see exactly how it was inside, and there I discovered a whole other world that is obviously much more scientific than what a normal high school student could imagine by themself. However, initially I was curious to see a course where one can practice all the knowledge they have gradually acquired. It is not just about writing an equation and finding a solution but making things work.

What is your relationship with Duckietown, how long have you been using it? How do you interact with the Duckietown ecosystem? How do you use it, what do you do with it?

These are interesting questions because I started as a student and then managed to see what’s behind Duckietown. I was attending the course Duckietown held at ETH in 2019. The class was limited to 30 students, I was really excited to be part of it. I met many excellent students there, some of whom I am still in touch with today.

When I started the course, I immediately told myself, “Duckietown is a great thing. If all universities used Duckietown, this would be a better world.” I liked the class a lot, then I also had the opportunity of being a TA. The TAship was an important step because I learned more than during the course. One thing is to live the experience as a student who has to take exams, complete various projects, etc. You need a deeper understanding to organize an activity. You have to take care of all the details and foresee the parts of the exercise that can be harder or simpler for the students. This experience helped me a lot. For example, I did an internship in Zurich where we had to develop a software infrastructure for a drone, and I found myself thinking, “wow this can be done with Duckietown, we can use the same technologies.” I noticed that even in the industry, often we see the use of the same technologies and tools that you can learn about thanks to Duckietown. Of course, maybe a company has its own customized tools, probably well optimized for its products. Perhaps it uses some other specific tool but let’s say you already know more or less what these tools are about. You know because in Duckietown, you have already seen how a robotics system should work and the pieces it is composed of. Duckietown has given me a huge boost with the internship and my Master’s thesis at NASA JPL. Consider that my thesis was on a system of multi drones, so I used, for example, Docker as a tool to simulate the different agents. With Duckietown, I acquired technical knowledge that I used in many other projects, including work.

Do you still use it today?

The last project I did with Duckietown is DuckVision. I know we could have thought of a better name. With one of my Duckietowner friends, Trevor Phillips, we enhanced the Duckiebot perception pipeline with another camera, a stereocamera made by Luxonis and Open CV called OAKD (OpenCV AI Kit with depth). This sensor is not just a simple camera, but it also mounts a VPU, Visual Processing Unit. Namely, it can analyze and make inferences on the images that the camera acquires onboard. It can perform object detection and tracking, gesture recognition, semantic segmentation, etc. There are plenty of models freely available online that can run on the OAK-D. We have integrated this sensor in the Duckietown ecosystem, using a similar approach used in the MOOC “Self-Driving Cars with Duckietown”, we created a small series of tutorials where you can just plug the camera on the robot, run our Docker container and have fun! With this project, we passed the first phase of the OpenCV AI Competition 2021. The idea behind the project was to increase the Duckiebot understanding of the environment, by using the depth information, the robot can have a better representation of its surroundings and so, for example, a better knowledge of its position. Also, in our opinion, the OAK-D in Duckietown can boost the research in autonomous vehicles and perception.
I would like to add something about the use of Duckietown, I have seen this project both as a student and from behind the scenes and I really understood that by using this platform you really learn a lot of things that are useful not only in the academic field but can also be very useful in the working environment with the practical knowledge that is often difficult to acquire during school. And in this regard I thought then given my history, I am Sicilian but I studied in Milan and then I went to Zurich, I asked myself what can I bring as a contribution of my travels, so I thought about using Duckietown in some universities here in Sicily in the universities of Palermo and Messina. And also, at the Polytechnic of Milan, for example, they have already begun to use it and have participated in the AIDO and have also placed well, they ended up among the finalists, so there is a lot of interest in this project.

Did you receive a positive response every time you proposed Duckietown?

Yes, and then there is a huge enthusiasm on the part of the students. I spoke with student associations first, then with the professors etc. but when the students see Duckietown for the first time, they are always really enthusiastic about using it.

"There is something that captures you in some way, and then just opens up a world when you start to actually see how all the systems are implemented. This is the nice thing in my opinion, you can decide the level of complexity you want to achieve."

Vincenzo Polizzi

The duck was a great idea!

Absolutely right! The duck was a great idea, yes. I like contrasts, you see a super simple friendly thing that hides a state-of-the-art robotics platform. Even in the students I saw this reaction, because the duck is the first thing you see, it looks like a game, something to play with, this is the first impact, then when you start you get curious. It captures you with simplicity and then you stay for the complexity.

Would you suggest Duckietown to friends and colleagues?

Sure! There is something that captures you and opens up a world when you start to see how all the systems are implemented. This is the nice thing in my opinion, you can decide the level of complexity you want to achieve. It’s a platform that looks like something to play with, a game or something, but in reality there is a huge potential, in terms of knowledge that everyone can acquire, it’s something that you can not easily find elsewhere. I also think it offers great support, such as educational material, exercises that are of high quality. You can learn a lot of different aspects of robotics, in my opinion. You can do control, you can do the machine learning part, perception . There’s really a world to explore. You can see everything there is about robotics. But you can also just focus on one aspect that maybe you’re more passionate about. So yes, I would recommend it because you can learn a lot, and as a student myself I would recommend it to my fellow colleagues.

Learn more about Duckietown

The Duckietown platform offers robotics and AI learning experiences.

Duckietown is modular, customizable and state-of-the-art. It is designed to teach, learn, and do research: from exploring the fundamentals of computer science and automation to pushing the boundaries of knowledge.

Tell us your story

Are you an instructor, learner, researcher or professional with a Duckietown story to tell? Reach out to us!

AI Driving Olympics 2021: Urban League Finalists

AI Driving Olympics 2021 - Urban League Finalists

This year’s embodied urban league challenges were lane following (LF), lane following with vehicles (LFV) and lane following with intersections, (LFI). To account for differences between the real world and simulation, this edition finalists can make one additional submission to the real challenges to improve their scores. Finalists are the authors of AI-DO 2021 submissions in the top 5 ranks for each challenge. This year’s finalists are:

LF

  • András Kalapos
  • Bence Haromi
  • Sampsa Ranta
  • ETU-JBR Team
  • Giulio Vaccari

LFV

  • Sampsa Ranta
  • Adrian Brucker
  • Andras Beres
  • David Bardos

LFI

  • András Kalapos
  • Sampsa Ranta
  • Adrian Brucker
  • Andras Beres

The deadline for submitting the “final” submissions is Dec. 9th, 2 pm CET. All submissions received after this time will count towards the next edition of AI-DO.

Don’t forget to join the #aido channel on the Duckietown Slack for updates!

Congratulations to all the participants, and best of luck to the finalists!

Amazon Web Services (AWS)

Join the AI Driving Olympics, 6th edition, starting now!

The 2021 AI Driving Olympics

Compete in the 2021 edition of the Artificial Intelligence Driving Olympics (AI-DO 6)!

The AI-DO serves to benchmark the state of the art of artificial intelligence in autonomous driving by providing standardized simulation and hardware environments for tasks related to multi-sensory perception and embodied AI.

Duckietown traditionally hosts AI-DO competitions biannually, with finals events held at machine learning and robotics conferences such as the International Conference on Robotics and Automation (ICRA) and the Neural Information Processing Systems (NeurIPS). 

AI-DO 6 will be in conjunction with NeurIPS 2021 and have three leagues: urban driving, advanced perception, and racing. The winter champions will be announced during NeurIPS 2021, on December 10, 2021!

Urban driving league

The urban driving league uses the Duckietown platform and presents several challenges, each of increasing complexity.

The goal in each challenge is to develop a robotic agent for driving Duckiebots “well”. Baseline implementations are provided to test different approaches. There are no constraints on how your agents are designed.

Each challenge adds a layer of complexity: intersections, other vehicles, pedestrians, etc. You can check out the existing challenges on the Duckietown challenges server.

AI-DO 2021 features four challenges: lane following (LF), lane following with intersections (LFI), lane following with vehicles (LFV) and lane following with vehicles and intersections, multi-body, with full information (LFVI-multi-full).

All challenges have a simulation and hardware component (🚙,💻), except for LFVI-multi-full, which is simulation (💻) only.

The first phase (until Nov. 7) is a practice one. Results do not count towards leaderboards.

The second phase (Nov. 8-30) is the live competition and results count towards official leaderboards. 

Selected submissions (that perform well enough in simulation) will be evaluated on hardware in Autolabs. The submissions scoring best in Autolabs will access the finals.

During the finals (Dec. 1-8) one additional submission is possible for each finalist, per challenge.

Winners (top 3) of the resulting leaderboard will be declared AI-DO 2021 winter champions and celebrated live during NeurIPS 2021. We require champions to submit a short video (2 mins) introducing themselves and describing their submission.

Winners are invited to join (not mandatory) the NeurIPS event, on December 10th, 2021, starting at 11.25 GMT (Zoom link will follow).   

Overview
🎯Goal: develop robotic agents for challenges of increasing complexity
🚙Robot: Duckiebot (DB21M/J)
👀Sensors: camera, wheel encoders
Schedule
🏖️Practice: Nov. 1-7
🚙Competition: Nov. 8-30
🏘️Finals: Dec. 1 – 8
🏆Winners: Dec. 10
Rules
🏖️Practice: unlimited non-competing submissions
🚙Competition: best in sim are evaluated on hardware in Autolabs
🏘️Finals: one additional submission for Autolabs
🏆Winners: 2 mins video submission description for NeurIPS 2021 event.

The challenges

Lane following 🚙 💻

LF – The most traditional of AI-DO challenges: have a Duckiebot navigate a road loop without intersection, pedestrians (duckies) nor other vehicles. The objective is to travel the longest path in a given time while staying in the lane, i.e., not committing driving infractions.

Current AI-DO leaderboards: LF-sim-validation, LF-sim-testing.

Previous AI-DO leaderboards: sim-validation, sim-testing, real-validation.

A DB21 Duckietown in a Duckietown equipped with Autolab infrastructure.

Lane following with intersections 🚙 💻

LFI – This challenge builds upon LF by increasing the complexity of the road network, now featuring 3 and/or 4-way intersections, defined according to the Duckietown appearance specifications. Traffic lights will not be present on the map. The objective is to drive the longest distance while not breaking the rules of the road, now more complex due to the presence of traffic signs.

Current AI-DO leaderboards: LFI-sim-validation, LFI-sim-testing.

Previous AI-DO leaderboards: sim-validation, sim-testing.

Duckiebot facing a lane following with intersections (LFI) challenge

Lane following with vehicles 🚙 💻

LFV – In this traditional AI-DO challenge, contestants seek to travel the longest path in a city without intersections nor pedestrians, but with other vehicles on the road. Non-playing vehicles (i.e., not running the user’s submitted agent) can be in the same and/or opposite lanes and have variable speed.

Current AI-DO leaderboards: LFV-sim-validation, LFV-sim-testing.

Previous AI-DO leaderboards: (LFV-multi variant): sim-validation, sim-testing, real-validation.

Lane following with vehicles and intersections (stateful) 💻

LFVI-multi-full – this debuting challenge brings together roads with intersections and other vehicles. The submitted agent is deployed on all Duckiebots on the map (-multi), and is provided with full information, i.e., the state of the other vehicles on the map (-full). This challenge is in simulation only.

Getting started

All you need to get started and participate in the AI-DO is a computer, a good internet connection, and the ambition to challenge your skills against the international community!  

We provide webinars, operation manuals, and baselines to get started.

May the duck be with you! 

Thank you to our generous sponsors!

Automatic Wheels and Camera Calibration for Monocular and Differential Mobile Robots

  • Automatic Wheels and Camera Calibration for Monocular and Differential Mobile Robots
  • Konstantin Chaika, Anton Filatov, Artyom Filatov, Kirill Krinkin
  • Applied Sciences 11, no. 13: 5806.

Automatic Wheels and Camera Calibration for Monocular and Differential Mobile Robots

After assembling the robot, components such as the camera and wheels need to be calibrated. This requires human participation and depends on human factors. We describe the approach to fully automatic calibration of a robot’s camera and wheels.

The camera calibration collects the necessary set of images by automatically moving the robot in front of the chess boards, and then moving it on the marked floor, assessing its trajectory curvature. As a result of the calibration, coefficient k is calculated for the wheels, and camera matrix K (which includes the focal length, the optical center, and the skew coefficient) and distortion coefficients D are calculated for the camera. 

Proposed approach has been tested on duckiebots in Alexander Popov’s International Innovation Institute for Artificial Intelligence, Cybersecurity and Communication, SPbETU “LETI”. This solution is comparable to manual calibrations and is capable of replacing a human for this task. 

Camera calibration process

The initial position of the robot is a part of the floor with chessboards in front, where the robot is located from the very beginning, on which its camera is directed and the floorsurface is marked with aruco markers on the other side of it.

There can be any number of chessboards, determined by the amount of free space around the robot. To a greater extent, the accuracy of calibration is affected by the frames with different positions of the boards, e.g., boards located at different distances from the robot and at different angles. The physical size and type of all the boards around the robot must be the same.

In fact, the camera calibration implies that the robot is rotating around its axis and taking pictures of all the viewable chessboards in turn. In this case, the ability to make several “passes” during the shooting process should be provided for, to control which of the boards the robot is currently observing and in which direction it should turn. As a result, the algorithm can be represented as a sequence of actions: “get a frame from the camera” and “turn” a littleThe final algorithm comprises the following sequence of actions:

  1. Obtain frame from the camera;
  2. Find a chessboard on the camera frame;
  3. Save information about board corners found in the image;
  4. Determine the direction of rotation according to the schedule;
  5. Make a step;
  6. Either repeat the steps described above, or complete the data
    collection and proceed with the camera calibration using OpenCV.

 

Wheels calibration process

Floor markers should be oriented towards the chessboards and begin as close to the robot as possible. The distance between the markers depends on camera’s resolution, as well as its height and angle of inclination, but it must be such that at least three recognizable markers can simultaneously be in the frame. For ours experiments, the distance between the markers was set as 15 cm with a marker size of 6.5 cm. The algorithm does not take into account the relative position of the markers against each other; however, the orientation of all markers must be strictly the same.

Let us consider the first iteration of the automatic wheel calibration algorithm:

  1. The robot receives the orientation of the marker closest to it and remembers it.
  2. Next, the robot moves forward with thespeeds of the left and right wheels equal to
    ω1ω2 for some fixed time t. The speeds are calculated taking into account the calibration
    coefficient k, which for the first iteration is chosen to equal 1 – that is, it is assumed that
    the real wheel speeds are equal.
  3. The robot obtains the orientation of the marker closest to it again and calculates the
    difference in angles between them.
  4. The coefficient ki for this step is calculated.
  5. The robot moves back for the same time t.

In order to reduce the influence of the error in calculating ki, coefficient k is refined only by the value of (ki−1)/2 after each iteration. It is important to complete this step after the robot moves back, because it reduces the chance of the robot moving outside the area width. If, after the next step, the modulus of the difference between (ki−1)/2 and 1.0 becomes less than the pre-selected E, then at this iteration (ki−1)/2 is not taken into account. If after three successive iterations ki is not taken into account, the wheel calibration is considered to be completed.

Accuracy Evaluation

To compare camera calibration errors, the knowledge of how to calculate these errors is needed. Since the calibration mechanism is used by the OpenCV library, the error is also calculated by the method offered by this library.

As noted earlier, with respect to calibration factors, the approach used to calibrate the camera is not applicable. Therefore, the influence of the coefficient on the robot’s trajectory curvature is estimated. To do this, the robot was located at a certain fixed distance from a straight line, along which it was oriented and then moved in manual mode strictly directly to a distance of two meters from the start point along the axis, relative to which it was oriented. Then, the robot stopped and the distance between the initial distance to the line and the final one was calculated.

Two metrices were estimated – reprojection error and straight line deviation. First one shows the quality of camera calibration, and the second one represents the quality of wheels calibration. Two pictures below present result of 10 independent tests in comparison with manual calibration.

 

 

 

The tests found that the suggested solution, on average, shows that the results are not much worse, than the classical manual solution when calibrating the camera, as well as when calibrating the wheels with a well known calibrated camera. However, when calibrating both the wheels and the camera, the wheel calibration can be significantly affected by the camera calibration effect. As a result of testing, a clear relationship was found between the reprojection error and the straight line deviation.

Method Modifications

After the integration of this approach, it became necessary to automate the last step-moving the robot to the field. Due to the fact that after the calibration step completion the robot becomes fully prepared for launching autonomous driving algorithms on it, the automation of this step further reduces the time spent by the operator when calibrating the robot, since instead of moving the robot to the field manually, he can place the next robot at the starting position. In our case, the calibration field was located at the side of the road lane so that the floor markers used to calibrate the wheels are oriented perpendicular to the road lane.

Thus, the first stage of the robot automatic removal from the calibration zone is to return its orientation back to the same state, as it was at the moment when the wheel calibration started. This was carried out using exactly the same approach that was described earlier—depending on the orientation of the floor marker closest to the robot, the robot rotates step by step about its axis clockwise or counterclockwise until the value of the robot’s orientation angle is modulo less than some preselected value.

At this point, the robot is still on the wheel calibration field, but in this case, it is oriented towards the lane. Thus, the last step is to move the robot outside the border of the field with markers. To do this, it is enough to give the robot a command to move directly until it stops observing the markers, when the last marker is hidden from the camera view. This means that the robot has left the calibration zone, and the robot can be put into the lane following mode.

 

 

 

 

Future Work

During the robot’s operation, the wheels calibration may become irrelevant. It can be influenced by various factors: a change in the wheel diameter due to wear of the wheel coating, a slight change in the characteristics of motors due to the wear of the gearbox plastic, and a change in the robot’s weight distribution, e.g., laying the cables on the other side of the case after charging the robot, and so a slight calibration mismatch can occur. However, all these factors have a rather small impact, and the robot will still have a satisfactory calibration. There is no need to re-perform the calibration process, just a little refinement of the current one seems to be enough. To do this, a section of the road along which the robots will be guaranteed to pass regularly, was selected. 

Further, markers were placed in this lane according to the rules described earlier: the distance between the markers is 15 cm; the size of the marker is 6.5 cm. The markers are located in the center of the lane. The distance between the markers may be not completely accurate, but they should be oriented in the same direction and co-directed with the movement in the lane on which they are placed. 

The first marker in the direction of travel must have a predefined ID. It can be anything, the only limitation is that it must be unique for a current robot environment. Further, the following changes were made to the algorithm for the standard control of the robot: when the robot recognizes the first marker with a predetermined ID while driving right in the lane, it corrects its orientation relative to this marker and continues to move strictly straight ahead. Further, the algorithm is similar to the one described earlier—the robot recognizing the next marker can refine its wheel calibration coefficient, apply it, and change the orientation coaxially with the next marker.

 

Conclusions

As a result, a solution was developed that allows a fully automatic calibration of the camera and the Duckiebot’s wheels. The main feature is the autonomy of the process, which allows one person to run the calibration of an arbitrary number of robots in parallel and not be blocked during their calibration. In addition, the robot is able to improve its calibration as it operates in default mode.

Comparing the developed solution with the initial one resulted in finding a slight deterioration in accuracy, which is primarily associated with the accuracy of the camera calibration; however, the result obtained is sufficient for the robot’s initial calibration and is comparable to manual calibration. 

Did you find this interesting?

Read more Duckietown based papers here.

Embedded out-of-distribution detection on an autonomous robot platform

Embedded out-of-distribution detection on an autonomous robot platform

Introduction

Machine learning is becoming more and more common in cyber-physical systems; many of these systems are safety critical, e.g. autonomous vehicles, UAVs, and surgical robots.  However, machine learning systems can only provide accurate outputs when their input data is similar to their training data.  For example, if an object detector in an autonomous vehicle is trained on images containing various classes of objects, but no ducks, what will it do when it encounters a duck during runtime?  One method for dealing with this challenge is to detect inputs that lie outside the training distribution of data: out-of-distribution (OOD) detection.  Many OOD detector architectures have been explored, however the cyber-physical domain adds additional challenges: hard runtime requirements and resource constrained systems.  In this paper, we implement a real-time OOD detector on the Duckietown framework and use it to demonstrate the challenges as well as the importance of OOD detection in cyber-physical systems.

Out-of-Distribution Detection

Machine learning systems perform best when their test data is similar to their training data.  In some applications unreliable results from a machine learning algorithm may be a mere nuisance, but in other scenarios they can be safety critical.  OOD detection is one method to ensure that machine learning systems remain safe during test time.  The goal of the OOD detector is to determine if the input sample is from a different distribution than that of the training data.  If an OOD sample is detected, the detector can raise a flag indicating that the output of the machine learning system should not be considered safe, and that the system should enter a new control regime.  In an autonomous vehicle, this may mean handing control back to the driver, or bringing the vehicle to a stop as soon as practically possible.

In this paper we consider the existing β-VAE based OOD detection architecture.  This architecture takes advantage of the information bottleneck in a variational auto-encoder (VAE) to learn the distribution of training data.  In this detector the VAE undergoes unsupervised training with the goal of minimizing the error between a true prior probability in input space p(z), and an approximated posterior probability from the encoder output p(z|x).  During test time, the Kullback-Leibler divergence between these distributions p(z) and q(z|x) will be used to assign an OOD score to each input sample.  Because the training goal was to minimize the distance between these two distributions on in-distribution data, in-distribution data found at runtime should have a low OOD score while OOD data should have a higher OOD score.

Duckietown

We used Duckietown to implement our OOD detector.  Duckietown provides a natural test bed because:

  • It is modular and easy to learn: the focus of our research is about implementing an OOD detector, not building a robot from scratch
  • It is a resource constrained system: the RPi on the DB18 is powerful enough to be capable of navigation tasks, but resource constrained enough that real-time performance is not guaranteed.  It servers as a good analog for a  system in which an OOD detector shares a CPU with perception, planning, and control software.
  • It is open source: this eliminates the need to purchase and manage licenses, allows us to directly check the source code when we encounter implementation issues, and allows us to contribute back to the community once our project is finished.
  • It is low-cost: we’re not made of money 🙂
 In our experiment, we used the stock DB18 robot.  Because we took advantage of the existing Duckietown framework, we only had to write three ROS nodes ourselves:
  • Lane following node: a simple OpenCV-based lane follower that navigates based on camera images.  This represents the perception and planning system for the mobile robot that we are trying to protect.  In our system the lane following node takes 640×480 RGB images and updates the planned trajectory at a rate of 5Hz.
  • OOD detection node: this node also takes images directly from the camera, but its job is to raise a flag when an OOD input appears (image with an OOD score greater than some threshold).  On the RPi with no GPU or TPU, it takes a considerable amount of time to make an inference on the VAE, so our detection node does not have a target rate, but rather uses the last available camera frame, dropping any frames that arrive while the OOD score is being computed.
  • Motor control node: during normal operation it takes the trajectory planned by the lane following node and sends it to the wheels.  However, if it receives a signal from the OOD detection node, it begins emergency breaking.

The Experiment

Our experiment considers the emergency stopping distance required for the Duckiebot when an OOD input is detected.  In our setup the Duckiebot drives forward along a straight track.  The area in front of the robot is divided into two zones: the risk zone and the safe zone.  The risk zone is an area where if an obstacle appears, it poses a risk to the Duckiebot.  The safe zone is further away and to the sides; this is a region where unknown obstacles may be present, but they do not pose an immediate threat to the robot.  An obstacle that has not appeared in the training set is placed in the safe zone in front of the robot.  As the robot drives forward along the track, the obstacle will eventually enter the risk zone.   Upon entry into the risk zone we measure how far the Duckiebot travels before the OOD detector triggers an emergency stop.

We defined the risk zone as the area 60cm directly in front of our Duckiebot.  We repeated the experiment 40 times and found that with our system architecture, the Duckiebot stopped on average 14.5cm before the obstacle.  However, in 5 iterations of the experiment, the Duckiebot collided with the stationary obstacle.

We wanted to analyze what lead to the collision in those five cases.  We started by looking at the times it took for our various nodes to run.  We plotted the distribution of end-to-end stopping times, image capture to detection start times, OOD detector execution times, and detection result to motor stop times.  We observed that there was a long tail on the OOD execution times, which lead us to suspect that the collisions occurred when the OOD detector took too long to produce a result.  This hypothesis was bolstered by the fact that even when a collision had occurred, the last logged OOD score was above the detection threshold, it had just been produced too late.  We also looked at the final two OOD detection times for each collision and found that in every case the final two times were above the median detector execution time.  This highlights the importance of real-time scheduling  when performing OOD detection in a cyber-physical system.

We also wanted to analyze what would happen if we adjusted the OOD detection threshold.  Because we had logged the the detection threshold every time the detector had run, we were able to interpolate the position of the robot at every detection time and discover when the robot would have stopped for different OOD detection thresholds.  We observe there is a tradeoff associated with moving the detection threshold.  If the detection threshold is lowered, the frequency of collisions can be reduced and even eliminated.  However, the mean stopping distance is also moved further from the obstacle and the robot is more likely to stop spuriously when the obstacle is outside of the risk zone.

 

Next Steps

In this paper we successfully implemented an OOD detector on a mobile robot, but our experiment leaves many more questions:

  • How does the performance of other OOD detector architectures compare with the β-VAE detector we used in this paper?
  • How can we guarantee the real-time performance of an OOD detector on a resource-constrained system, especially when sharing a CPU with other computationally intensive tasks like perception, planning, and control?
  • Does the performance vary when detecting more complex OOD scenarios: dynamic obstacles, turning corners, etc.?

Did you find this interesting?

Read more Duckietown based papers here.

EdTech awards 2021: Duckietown finalist in 3 categories!

Duckietown reaches the finals in the EdTech Awards 2021

The EdTech awards are the largest and most competitive recognition program in all of education technology.

The competition, led by the EdTech digest, recognizes the biggest names in edtech – and those who soon will be, by identifying all over the world the products, services and people that bet promote education through the use of technology, for the benefit of learners.

The 2021 edition has brought a big surprise to Duckietown, as it was nominated as a finalist in 3 different categories:

  • Cool Tool Award: as robotics (for learning, education) solution;

  • Cool Tool Award: as higher education solution;

  • Trendsetter Award: as a product or service setting a trend in education technologies.

Although a final is just a starting point, we are proud of the hard work done by the team in this particularly difficult year of pandemic and lockdowns, and grateful to you all for the incredible support, constructive feedback and contributions!

To the future, and beyond!

(hidden) Want to learn more about us?

Ubuntu laptop terminal interface with hands operating keyboard, Duckiebot and duckies out of focus in foreground

“Self-Driving Cars with Duckietown” MOOC starting soon

Join the first hardware based MOOC about autonomy on edX!

Are you curious about robotics, self-driving cars, and want an opportunity to build and program your own? Set to start on March 22nd, “Self-Driving Cars with Duckietown” is a hands-on introduction to vehicle autonomy, and the first ever self-driving cars MOOC with a hardware track!

Designed for university-level students and professionals, this course is brought to you by the Swiss Federal Institute of Technology in Zurich (ETHZ), in collaboration with the University of Montreal, the Duckietown Foundation, and the Toyota Technological Institute at Chicago.

Learning autonomy requires a fundamentally different approach when compared to other computer science and engineering disciplines. Autonomy is inherently multi-disciplinary, and mastering it requires expertise in domains ranging from fundamental mathematics to practical machine-learning skills.

This course will explore the theory and implementation of model- and data-driven approaches for making a model self-driving car drive autonomously in an urban environment, while detecting and avoiding pedestrians (rubber duckies)!

In this course you will learn, hands-on, introductory elements of:

  • computer vision
  • robot operations 
  • ROS, Docker, Python, Ubuntu
  • autonomous behaviors
  • modelling and control
  • localization
  • planning
  • object detection and avoidance
  • reinforcement learning.

The Duckietown robotic ecosystem was created at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in 2016 and is now used in over 90 universities worldwide.

“The Duckietown educational platform provides a hands-on, scaled down, accessible version of real world autonomous systems.” said Emilio Frazzoli, Professor of Dynamic Systems and Control, ETH Zurich, “Integrating NVIDIA’s Jetson Nano power in Duckietown enables unprecedented access to state-of-the-art compute solutions for learning autonomy.”

This massive online open course will be have a hands-on learning approach using, for the hardware track, real robots. You will learn how autonomous vehicles make their own decisions, going from theory to implementation, deployment in simulation as well as on the new NVIDIA Jetson Nano powered Duckiebots.

“The new NVIDIA Jetson Nano 2GB is the ultimate starter AI computer for educators and students to teach and learn AI at an incredibly affordable price.” said Deepu Talla, Vice President and General Manager of Edge Computing at NVIDIA. “Duckietown and its edX MOOC are leveraging Jetson to take hands-on experimentation and understanding of AI and autonomous machines to the next level.”

The Duckiebot MOOC Founder’s edition kits are available worldwide, and thanks to OKdo, are now available with free shipping in the United States and in Asia!

“I’m thrilled that ETH, with UMontreal, the Duckietown Foundation, and the Toyota Technological Institute in Chicago, are collaborating to bring this course in self-driving cars and robotics to the 35 million learners on edX. This emerging technology has the potential to completely change the way we live and travel, and the course provides a unique opportunity to get in on the ground floor of understanding and using the technology powering autonomous vehicles,” said Anant Agarwal, edX CEO and Founder, and MIT Professor.

Enroll now and don’t miss the chance to join in the first vehicle autonomy MOOC with hands-on learning!

AI Driving Olympics 5th edition: results

AI-DO 5: Urban league winners

This year’s challenges were lane following (LF), lane following with pedestrians (LFP) and lane following with other vehicles, multibody (LFV_multi). 

Let’s find out the results in each category:

LF

  1. Andras Beres 🇭🇺  
  2. Zoltan Lorincz 🇭🇺
  3. András Kalapos 🇭🇺

LFP

  1. Bea Baselines 🐤
  2. Melisande Teng 🇨🇦 
  3. Raphael Jean 🇨🇦

LFV_multi

  1. Robert Moni 🇭🇺
  2. Márton Tim 🇭🇺
  3. Anastasiya Nikolskay 🇷🇺

Congratulations to the Hungarian Team from the Budapest University of Technology and Economics for collecting the highest rankings in the urban league!

Here’s how the winners in each category performed both in the qualification (simulation) and in the finals running on real hardware:

Andras Beres - Lane following (LF) winner

Melisande Teng - Lane following with pedestrians (LFP) winner

Robert Moni - Lane following with other vehicles, multibody (LFV_multi) winner

AI-DO 5: Advanced Perception league winners

Great participation and results in the Advanced Perception league! Check out this year’s winners in the video below:

AI-DO 5 sponsors

Many thanks to our amazing sponsors, without which none of this would have been possible!

Stay tuned for next year AI Driving Olympics. Visit the AI-DO page for more information on the competition and to browse this year’s introductory webinars, or check out the Duckietown massive open online course (MOOC) and prepare for next year’s competition!

AI-DO 5 leaderboard update

AI-DO 5 pre-finals update

With the fifth edition of the AI Driving Olympics finals day approaching, 1326 solutions submitted from 94 competitors in three challenges, it is time to glance over at the leaderboards

Leaderboards updates

This year’s challenges are lane following (LF), lane following with pedestrians (LFP) and lane following with other vehicles, multibody (LFV_multi). Learn more about the challenges here. Each submission can be sent to multiple challenges. Let’s look at some of the most promising or interesting submissions.

The Montréal menace

Raphael Jean at Mila / University of Montréal is a new entrant for this year. 

An interesting submission: submission #12962 

All of raph’s submissions.

The submissions from the cold

Team JetBrains from Saint Petersburg was a winner of previous editions of AI-DO. They have been dominating the leaderboards also this year.

Interesting submissions: submission #12905

All of JetBrains submissions: JBRRussia1. 

 

BME Conti

PhD student Robert Moni (BME-Conti) from Hungary. 

Interesting submissions: submission #12999 

All submissions: timur-BMEconti

 

Deadline for submissions

The deadline for submitting to the AI-DO 5 is 12am EST on Thursday, December 10th, 2020. The top three entries (more if time allows) in each simulation challenge will be evaluated on real robots and presented at the finals event at NeurIPS 2020, which happens at 5pm EST on Saturday, December 12.