Computer vision models known as convolutional neural networks can be trained to recognize objects nearly as accurately as humans do. However, these models have one significant flaw: Very small changes to an image, which would be nearly imperceptible to a human viewer, can trick them into making egregious errors such as classifying a cat as a tree.
A team of neuroscientists from MIT, Harvard University, and IBM have developed a way to alleviate this vulnerability, by adding to these models a new layer that is designed to mimic the earliest stage of the brain’s visual processing system. In a new study, they showed that this layer greatly improved the models’ robustness against this type of mistake.
“Just by making the models more similar to the brain’s primary visual cortex, in this single stage of processing, we see quite significant improvements in robustness across many different types of perturbations and corruptions,” says Tiago Marques, an MIT postdoc and one of the lead authors of the study.
Convolutional neural networks are often used in artificial intelligence applications such as self-driving cars, automated assembly lines, and medical diagnostics. Harvard graduate student Joel Dapello, who is also a lead author of the study, adds that “implementing our new approach could potentially make these systems less prone to error and more aligned with human vision.”
“Good scientific hypotheses of how the brain’s visual system works should, by definition, match the brain in both its internal neural patterns and its remarkable robustness. This study shows that achieving those scientific gains directly leads to engineering and application gains,” says James DiCarlo, the head of MIT’s Department of Brain and Cognitive Sciences, an investigator in the Center for Brains, Minds, and Machines and the McGovern Institute for Brain Research, and the senior author of the study.
The study, which is being presented at the NeurIPS conference this month, is also co-authored by MIT graduate student Martin Schrimpf, MIT visiting student Franziska Geiger, and MIT-IBM Watson AI Lab Co-director David Cox.
Mimicking the brain
Recognizing objects is one of the visual system’s primary functions. In just a small fraction of a second, visual information flows through the ventral visual stream to the brain’s inferior temporal cortex, where neurons contain information needed to classify objects. At each stage in the ventral stream, the brain performs different types of processing. The very first stage in the ventral stream, V1, is one of the most well-characterized parts of the brain and contains neurons that respond to simple visual features such as edges.
“It’s thought that V1 detects local edges or contours
Source - Continue Reading: https://news.mit.edu/2020/object-recognition-v1-1203