ImageNet contains 14 million annotated images across 21,841 categories from WordNet. Fei-Fei Li launched project in 2006 using Mechanical Turk workers from 167 countries to label images. ILSVRC challenge started 2010 with 1000 classes for classification and detection. AlexNet CNN in 2012 hit 15.3% top-5 error, 10.8 points better than runner-up, sparking AI boom. GPUs enabled CNN training success. By 2015 ResNet exceeded human performance on ImageNet-1K. Top-5 error dropped to 2.251% by 2017 SENet win. Challenge ended 2017 as benchmark saturated. Humans estimated at 2.4% error max. Over 6% validation labels wrong. 10% labels ambiguous or erroneous. 17% ImageNet-1K images contain faces, later blurred with minimal performance loss. Removed 2702 person categories in 2021 to curb problematic model behaviors. 1593 of 2832 person synsets deemed offensive. Images scraped from search engines in multiple languages. Each image labeled with one WordNet synset ID. Bounding boxes on 1 million images. Full ImageNet-21K has uneven class sizes, some with 1-10 images. ImageNet-1K subset has 1.28M train, 50K val, 100K test images. Crowdsourced annotation averaged 50 images per minute per worker. Original plan aimed for 400M images but scaled back. Dataset drove shift from SVMs to CNNs in competitions.
Comments
Be the first to comment!