Saravana Pandian SA
14. January 2024 Categories: Technology,

Face Detection is the process of detecting the face of one or more persons in an image and finds wide application in numerous areas including auto focus in cameras, person tracking in video conferencing, gaming etc. This computer vision problem, essentially a binary classifier with localization, is being solved in many ways with different technologies and improvements are still being worked upon by computer scientists and engineers.

While face recognition identifies the person, face detection is the steppingstone to it as this will create the bounding box containing the face to be further processed by the recognition algorithm. Detection without false positive and false negatives, faster detection time, detection in poor conditions, partial detection etc. are some of the preferred attributes of a face detection system. This blog will capture some of the popular face detection frameworks and tools based on Machine learning and briefly covers their operation.

OpenCV Haar cascade

Originally proposed by Paul Viola and Michael Jones, this method uses the cascaded classifiers to run on Haar feature extracted from images and identify the faces in it. This is one of the most used and computationally less intensive algorithms. Haar features are essentially small binary filters that use the difference in two or more adjacent rectangular regions as a feature. This can give good results as the face has some common features, for e.g. the rectangular region covering both eyes is darker than the corresponding rectangle in the forehead. And the speed comes from the fact that it uses integral images - where each pixel is represented as sum of all the pixels to the top left of it. In this way, it is quickly possible to calculate the sum of pixels of a rectangular region in the original image with few additions and subtractions on the integral image. Significant performance is obtained as all the operations are done as sum and differences.

OpenCV Haar Cascade

OpenCV Haar Cascade (Image Source: https://docs.opencv.org/4.x/d2/d99/tutorial_js_face_detection.html)

Since the number of features is way too high, a variance of Adaptive Boosting algorithm (Adaboost) is trained to choose the best features and use them. With a cascade of weak classifiers, a strong classifier is created as a weighted sum of weak ones. With training in a large set of images, it is possible to achieve good accuracy. Further it is possible to run the algorithm for any window size thereby enabling the detector to run on variable image size. OpenCV provides a Haar cascade detector implementation that can be used to run the face detection algorithm.

OpenCV DNN

Deep neural networks (DNN) are ML algorithms that are designed inspired by the human neural system. Having more than 1 hidden layer, the DNN is trained using back propagation and weights of the neurons updated to reflect the learning. With repeated supervised training the DNNs can be used for different kinds of applications. It is also possible to build it with a variety of layers such as convolutional layer, max-pooling layer, etc.

OpenCV has a face detection module called Yunet that is trained on WIDER Face and is highly optimized for performance. It employs depth wise convolution and pointwise convolution to replace standard convolution. It is highly efficient in detection with a good level of accuracy but offers very high speed suitable for resource constrained edge devices. OpenCV's DNN face detector module is being widely used for many such applications.

OpenCV DNN

OpenCV DNN (Image Source: https://opencv.org/blog/opencv-face-detection-cascade-classifier-vs-yunet/)


The pre-trained files are available readily for use (res10_300x300_ssd_iter_140000_fp16.caffemodel & deploy.prototxt) for Caffe and (opencv_face_detector_uint8.pb & opencv_face_detector.pbtxt) for Tensorflow. Otherwise, the model can be trained with needed images and utilized.

HOG + Linear SVM on Dlib

Dlib is a very useful and practical toolkit for making real world machine learning and data analysis applications. It provides a few face detection algorithms and one of them is the Histogram of oriented Gradients detector with a linear SVM. Essentially the image is split into small cells with magnitude and direction of gradients found in each cell (oriented gradients). Then typically the angles are categorized in to 9 values - 0°, 20°, 40°,60°,80°,100°,120°, 140° and 160°. To build the histogram, for each gradient, the magnitude is weighted and added to the angle categories.

HOG + Linear SVM on Dlib

HOG + Linear SVM on Dlib (Image Source: https://www.warse.org/IJETER/static/pdf/file/ijeter244892020.pdf)

This histogram of the oriented gradients depicts the feature of the face region. Then the block is normalized and formed as feature vector. Finally, a linear Support Vector Machine is trained with both positive and negative samples to create the binary classifier as the final step. During detection, the same steps are followed over a sliding window, feature vector extracted from HOG and given to linear SVM to know if there is a face there or not.

Dlib MMOD CNN face detector

Dlib also offers another face detection algorithm called the MMOD CNN face detector. Based on the Convolutional Neural Networks (CNN), it implements Max-Margin Object Detection (MMOD) algorithm for improved results. CNN are special types of neural networks that can work effectively on grid-based data like images and can run effectively on GPUs rather than CPUs. The algorithm identifies 68 landmarks in a human face and uses them as feature vectors. The trained model has a vector that has generalized the desired features to maximize the margin between it and the true positive. The scalar product between this and calculated feature is used to identify the face.

While this algorithm can be trained with fewer images than other ones, the Dlib library comes with a pretrained model that fares quite well for most applications. Though the algorithm is slower, it offers good performance.

MediaPipe Face Detection

As a part of the MediaPipe On-device machine learning solutions framework, the face detector is based on the BlazeFace detector. It is optimized for edge and mobile application use cases. The MediaPipe uses an improved network based on MobileNet, more depthwise convolution and fixed time GPU dispatch model.

MediaPipe Face Detection

MediaPipe Face Detection (Image Source: https://developers.google.com/mediapipe/solutions/vision/face_detector)

The MediaPipe face detector can output face locations, along with the left eye, right eye, nose tip, mouth, left eye tragion, and right eye tragion facial key points. It has a pre-built solution for identifying faces within a given frame, along with the ability to track their movements and facial landmarks over time. It can quiet accurately detect and track human faces in real-time video streams or images.

Conclusion

While these are some of the face detection algorithms in use, there are many other face detection algorithms and frameworks such as Single Shot Multibox Detector (SSD), Dual Shot Face Detector (DSFD), RetinaFace Mobilenet, MediaPipe etc. In addition to these, NVIDIA offers DetectNet_v2 detector with ResNet18 as a feature extractor targeted for its GPU platforms.

Of the above algorithms, RetinaFace and MTCNN algorithms are good for face detection, but RetinaFace has high accuracy and low loss compared to MTCNN. Also, it can create 3D face from 2D image along with the landmarks. But nevertheless, the end application dictates the exact algorithm that can be used based on the computation power available, presence of CPU vs GPU, speed of detection, desired accuracy etc.

Subscribe to our Blog