GeneriCon 2023Join us in Denver from June 7 – 9 to see what’s coming next.

Register now

Atul Lal

Tiny Face Detection and Recognition in the Wild

Cover Image for Tiny Face Detection and Recognition in the Wild
Atul Lal
Atul Lal

Introduction to the Challenge

Face recognition is a fundamental, almost ubiquitous, issue in computer science today. It serves as the primary stepping stone towards many subsequent face-related applications: face parsing, face verification, auto-tagging algorithms, and robust retrieval systems.

*Remember:* Over the past few decades, we've gotten really good at detecting faces in *constrained* situations (think passport photos). But the real world is messy!

However, one of the most formidable challenges in modern computer vision remains detecting tiny faces in unconstrained environments—often referred to as "in the wild". In this post, we're going to explore how we leveraged Generative Adversarial Networks (GANs)A class of machine learning frameworks designed by Ian Goodfellow et al. in 2014, where two neural networks contest with each other. to enhance the accuracy of face recognition algorithms by artificially generating super-resolution images.

Tiny Face Detection and Recognition

So, what exactly is a tiny face? In our research context, tiny faces are defined as faces that occupy less than 10% of an image's total area.

Why is it so hard? At less than 10% area, a face might only be a few pixels wide. Key facial landmarks (eyes, nose, mouth) are often blurred together into a single amorphous blob.

Traditional methods for detecting faces relied heavily on sliding window approaches or Region Proposal Networks (RPNs)A fully convolutional network that simultaneously predicts object bounds and objectness scores at each position.. While excellent for standard faces, these techniques are computationally expensive when scaling down windows to search for tiny objects, and they often fail spectacularly on low-resolution patches.

Our Two-Step Approach

To address this challenge head-on, we proposed a novel approach utilizing GANs to hallucinate high-resolution details onto low-resolution tiny faces. The architecture works in a two-stage pipeline:

  1. Super-Resolution Generation: We use a GAN to upscale and generate high-resolution image patches from low-resolution inputs.
  2. Detection & Recognition: We pass these newly generated, high-fidelity images into a pre-trained face recognition model to accurately detect and classify the faces.

Our specific GAN architecture is heavily inspired by SRGAN (Super-Resolution Generative Adversarial Network)A framework capable of inferring photo-realistic natural images for 4x upscaling factors.. It excels at inferring photo-realistic natural images for massive upscaling factors (like 4x).

To achieve this, SRGAN abandons standard Mean Squared Error (MSE) loss, which tends to produce overly smooth, blurry images. Instead, it utilizes a perceptual loss function which consists of an adversarial loss (pushing the generator to fool the discriminator) and a content loss (ensuring the upscaled image actually matches the low-res input's semantic features).

Empirical Results

We evaluated our two-step approach on several notoriously difficult benchmark datasets for "in the wild" detection.

*Performance metrics:* We crushed it on WIDER FACE and FDDB! 🚀

Our results unequivocally demonstrate that this GAN-assisted approach outperforms traditional state-of-the-art methods for detecting tiny faces.

  • On the WIDER FACE dataset, we achieved an average precision of 0.89.
  • On the FDDB dataset, we hit an average precision of 0.92.

Robustness Testing

We didn't stop at just top-line metrics. We conducted extensive ablation studies to evaluate the impact of different hyperparameters and environmental factors, including:

  • The absolute size of the input image crops.
  • The number of training epochs required for the GAN to converge.
  • Variations in the underlying GAN architecture.

The data shows that our approach is highly robust. It can achieve high accuracy even when fed extremely small input patches and requires a surprisingly limited number of training epochs to produce actionable super-resolution images.

Conclusion

In conclusion, we have proposed a highly effective new approach for tiny face detection and recognition in unconstrained, "in the wild" environments. By offloading the burden of low-resolution feature extraction to a super-resolution GAN, we allow standard, highly-optimized face detection networks to operate on clear, hallucinated imagery—bridging the gap between the lab and the real world.

References & Citations

  1. Ledig, C., et al. (2017). "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network". CVPR. Link
  2. Goodfellow, I., et al. (2014). "Generative Adversarial Nets". NIPS. Link
  3. Yang, S., et al. (2016). "WIDER FACE: A Face Detection Benchmark". CVPR.
Image of Atul Lal

About Atul Lal

I am a software engineer with a passion for creating innovative and impactful applications that solve real-world problems. At Commvault Systems, I optimized APIs, developed distributed systems, and automated cloud environments for over two years.