Just FYI from: Turns out humans are leading AI systems astray because we can't agree on labeling • The Register
“Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows.”
" The error rates vary across the datasets. In ImageNet, the most popular dataset used to train models for object recognition, the rate creeps up to six per cent.
"Considering it contains about 15 million photos, that means hundreds of thousands of labels are wrong. Some classes of images are more affected than others, for example, ‘chameleon’ is often mistaken for ‘green lizard’ and vice versa.
“If, say, many images of the sea seem to contain boats and they keep getting tagged as ‘sea’, a machine might get confused and be more likely to incorrectly recognize boats as seas.”