Article: Human Labelling leading AI astray

Just FYI from: Turns out humans are leading AI systems astray because we can't agree on labeling • The Register

“Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows.”

" The error rates vary across the datasets. In ImageNet, the most popular dataset used to train models for object recognition, the rate creeps up to six per cent.

"Considering it contains about 15 million photos, that means hundreds of thousands of labels are wrong. Some classes of images are more affected than others, for example, ‘chameleon’ is often mistaken for ‘green lizard’ and vice versa.

“If, say, many images of the sea seem to contain boats and they keep getting tagged as ‘sea’, a machine might get confused and be more likely to incorrectly recognize boats as seas.”


:+1: for sharing this.

This directly ties back to Course 2 where we harp a lot on dataset validation. It is extremely hard and painstaking.

This is why the field is increasingly moving toward self-supervised labeling cause we need lots of labels: both collecting and validating them is getting increasingly hard, so we need other methods – automated methods!

1 Like