Lagenda The Layer Team Age and Gender Dataset

About dataset

LAGENDA dataset has been created by LayerTeam for age and gender recognition tasks.

The key features of tgihis dataset are as follows:

  • Includes a total of 67 159 images from Open Images Dataset featuring 84 192 individuals with ages from 0 to 95.
  • Near perfect balance for all ages up to ~65 years, achieved by balancing ages and genders within each age group with a step of 5. See our paper for the details.
  • Each image contains a minimum of 1 face with an associated person. We also provide all the faces and persons that can be detected in every image using our detector. Please refer to the paper for more details. The images do not have any ground truth labels, and the answers have been manually created by human annotators. To the best of our knowledge, this is the first dataset of its kind.
  • The aggregation method used in this dataset achieved a mean absolute error (MAE) of 3.47 for human annotators, as per statistics from control tasks.
  • The dataset contains minimal celebrity data, thus reflecting real-world, in-the-wild scenarios.
  • The license for this dataset is CC 2.0. Do whatever you want, just remember to cite us.
  • We have additionally provided modifications of the IMDB-clean, UTK, FairFace and Adience datasets, which have been annotated in the same manner for all persons (except Adience) and faces.
Home

Related Work

MiVOLO: Multi-input Transformer for Age and Gender Estimation

Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation


Please, cite us:

  @article{mivolo2023,
    Author = {Maksim Kuprashevich and Irina Tolstykh},
    Title = {MiVOLO: Multi-input Transformer for Age and Gender Estimation},
    Year = {2023},
    Eprint = {arXiv:2307.04616},
  }
      

  @article{mivolo2024,
    Author = {Maksim Kuprashevich and Grigorii Alekseenko and Irina Tolstykh},
    Title = {Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation},
    Year = {2024},
    Eprint = {arXiv:2403.02302},
  }
            

Downloads

You can use the buttons below to download datasets manually or visit our repository from the header of this page in order to use scripts.
Additionally, we offer there our awesome ready-to-use models and inference code!
Dataset Images Annotations
Lagenda
IMDB-clean
modified
UTK Modified
[Random Split]
UTK Modified
[Paper Split]
FairFace
Adience
AgeDB

Annotation format

The annotation format is a simple .csv file, based on the original IMDB-clean dataset. The header is structured as follows:

 img_name,age,gender,face_x0,face_y0,face_x1,face_y1,person_x0,person_y0,person_x1,person_y1        
        
Each image can contain multiple answers, corresponding to the individuals present in the image.
If the age and gender fields have a value of -1, it indicates that there is no ground truth answer for the person/face. These bounding boxes are suitable only for detection tasks or for preparing age/gender estimation, as described in our article.
Additionally, some samples may not have a paired person for the face, in which case the person bounding box coordinates will also be set to -1.
Please, feel free to contact us with any method from the footer: