Skip to content

Synthetic image datasets for benchmarking face recognition


The availability of large-scale face datasets has been key in the progress of face recognition. However, due to licensing issues or copyright infringement, some datasets are not available anymore (e.g. MS-Celeb-1M). Recent advances in Generative Adversarial Networks (GANs), to synthesize realistic face images, provide a pathway to replace real datasets by synthetic datasets, both to train and benchmark face recognition (FR) systems.

We introduced a method to generate a synthetic dataset, without the need for human intervention, by exploiting the latent structure of a StyleGAN2 model with multiple controlled factors of variation.


Colbois, L. and de Freitas Pereira, T. and Marcel, S. (2021). On the use of automatically generated synthetic image datasets for benchmarking face recognition. International Joint Conference on Biometrics (IJCB 2021).


We confirmed that (i) the generated synthetic identities are not data subjects from the GAN's training dataset, which is verified on a synthetic dataset with 10K+ identities; (ii) benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.


  • Face Recognition

Technology Readiness Level



Example of synthetic generated images for a single synthetic identity, with variations. The first image (highlighted in red) is the main reference (neutral expression, frontal view, frontal illumination). Using a latent editing approach, expression variations (1st row), pose variations (2nd row) and illumination variations (3rd row) were generated.

Contact us for more information

  • Interested in using our technologies?
  • Interested to know more about the licensing possibilities and conditions?

Contact us