Article: Human Labelling leading AI astray

aquacalc · April 5, 2021, 2:14pm

Just FYI from: Turns out humans are leading AI systems astray because we can't agree on labeling • The Register

“Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows.”

" The error rates vary across the datasets. In ImageNet, the most popular dataset used to train models for object recognition, the rate creeps up to six per cent.

"Considering it contains about 15 million photos, that means hundreds of thousands of labels are wrong. Some classes of images are more affected than others, for example, ‘chameleon’ is often mistaken for ‘green lizard’ and vice versa.

“If, say, many images of the sea seem to contain boats and they keep getting tagged as ‘sea’, a machine might get confused and be more likely to incorrectly recognize boats as seas.”

vjreddi · April 6, 2021, 1:05pm

for sharing this.

This directly ties back to Course 2 where we harp a lot on dataset validation. It is extremely hard and painstaking.

This is why the field is increasingly moving toward self-supervised labeling cause we need lots of labels: both collecting and validating them is getting increasingly hard, so we need other methods – automated methods!

Topic		Replies	Views
ML for classification vs prediction and dataset labeling General Discussions	0	298	July 14, 2021
Data collection and size General Discussions	2	346	August 12, 2021
AI Embedded in Sensors General Discussions	3	396	December 22, 2021
Nice WiReD "5 Levels" vid on ML General Discussions	2	313	September 5, 2021
Sharing two papers I found interesting Education	0	362	July 29, 2021

Article: Human Labelling leading AI astray

Related topics