Analyzing the LB we found the 3 issues: 1) Why the median traveled distance is taken into account instead of mean. In median distance you can crash 2 times and then ride 3 times for the max tiles — and the traveled distance will be the same if you travel 5 times without crash for max titles. If we try to model the real autonomous car — it’s better to ride with less speed without crash, than drive at max speed and have a crash. So seems like mean is better metric?
One may compare for example 2 top submissions:
1579 and 1633. In the latter the car crashes 2 times and the median traveled distance is almost the same
2) Why rounded traveled distance on LB is not the same as in logs?
Again one may compare for example 2 top submissions:
1579 and 1633. The driven_lanedir_median are 18.374895240961237 and 18.284656855293623 respectively. But on LB the “Traveled distance” = 18.6 for both subs
3) Quality assessment
1063 and 1496 – seems like the first is ridding much better than the latter. But The latter is higher on LB.
So, regarding these issues seems like though on paper these metrics seem relatively fair enough to judge the quality of models – on the LB one can found cases where the the “better” model scores the “lesser”. The first 2 points are technical – can we do something about it?