Human visual acuity is generally much better than even the highest resolution photographs, so not all of these features are accessible to DNNs. For example, photos, especially compressed ones, will tend to smear fine textures.
Results
In their experiments, the researchers found that MonoDepth primarily uses the vertical position of objects to estimate their depth, rather than their apparent size. This can be affected by camera position – roll and pitch – and the model then tends to mis-estimate distance. Furthermore, MonoDepth is unreliable when faced with objects that weren’t in its training set.
The Storage Bits take
While this study is limited to a single DNN – MonoDepth – trained on a single dataset – KITTI – it points up the need to profile these machine learning models. Given that we’ll have tens of millions of machine vision enabled vehicles cruising around in the next decade, we don’t want them mowing down costumed trick or treaters just because they don’t look like the people they were trained to see.
What this human sees is that if we don’t understand how DNNs achieve their results, we are bound to discover their limitations in practice, rather than in tests. Sufficient tragedies – think Boeing 737 Max – could cripple public acceptance of machine learning. And that, given the shrinking workforces in the developed world, would be an even greater tragedy if we are to keep our economies growing.
Comments welcome!
Related Topics:
Storage
Digital Transformation
CXO
Internet of Things
Innovation
Enterprise Software