Basic Conceptual Understanding of Facial Recognition

Bashir
4 min readNov 6, 2020

--

Facial recognition is a revolutionary technology that is widely used to detect and recognize faces. Having the ability to detect and recognize human faces has many applications that can range from social networking to personal safety and criminal prosecution.

To determine who is the person in an image, the very first thing that must be accomplished is to detect the face in the image. After the face is detected, it then must be cropped-out and analyzed to determine who the person is (aka recognition). Let’s first discuss facial detection then we’ll explore facial recognition.

Facial Detection

There are many algorithms that are used to detect faces and they range in accuracy based on the methodology used.

  • Knowledge-based methods: able to detect faces based on what a human would call a face (for example, looking at facial features such as eyes, nose, and mouth and their relationship). It’s a way of detecting faces based on hard-coded rules. This is a rudimentary way of detecting faces and it can be difficult and cumbersome to write rules for detecting faces. One main limitation of this method is the inability to detect faces in different poses or angles.
  • Feature invariant methods: This method will detect faces by first extracting facial features. Applies second derivative Gaussian filters to search for features that help in the detection of faces even at slightly different poses. The detection of facial features can be complicated by the illumination and background.
  • Template Matching Methods: Faces are detected using a hard-coded template based on either edges/regions or on facial contours. This method detects the faces by comparing them to the hard-coded template. It also uses average pixel values to determine the location of the eyes where the eyes are usually darker than the face. The main downside to this method is getting the proper template that can simulate the possible different poses.
  • Appearance-based methods: This method involves training a classifier to detect faces. This method involves any of the statistical or machine learning techniques such as Principal Component Analysis (PCA) and neural network and many others. Neural network is promising and solves many problems with facial detection and recognition, but can also determine emotion, gender, and age.

Steps in facial detection (there are variations depending on the method used):

  1. Convert image to black and white (RGB to grayscale).
  2. Analyze each pixel in conjunction with its surrounding pixels
  3. Image segmentation (contour detections)
  4. Extract facial features and normalize the images. Machine Learning algorithms can be used to recognize facial features.

Facial Recognition

Once the face is detected in an image, it can be analyzed further to determine who is in the picture. Below are the steps that would be taken for performing facial recognition:

  1. After the face is detected using the process above, it must be cropped to avoid the analysis of unnecessary objects in the image.
  2. Run the image through a convolutional neural network (CNN) which will generate unique values (embeddings) for the face (described below).
  3. Compare the embedding generated in the previous step to the embeddings of individuals in our database. If it matches an embedding then we know who the person in the image is.

What are embeddings and how are they generated?

Face embedding is a mathematical representation of different facial features which allow us to compare similarities between two faces by looking at the embedding. For example, if we compare faces by looking at the eyes only then we’ll have many faces that will get matched and the accuracy of our facial recognition will be dismal. But what if we analyzed a lot of facial features at once (some of the analyzed facial features are shown in the image above)? . Then our facial recognition will be of high accuracy because we are looking at many variables. The tricky part of having many variables is deciding how are they are going to be represented. This is solved by representing all the variables as a continues vector in n-dimensional space, where n is the number of variables. With this representation, highly similiar items will exist close to each other in our n-dimensional space. You can read more about this by looking into the Nearest-neighbor model. For our purpose, the takeaway is that similiar items will get mapped to a very similar location in our n-dimensional space and this allows us to know how similar the two items are.

I’m currently working on implementing a facial recognition algorithm using all the steps stated above. A detailed walkthrough blog of how an image gets processed and encoded with embedding will be released at a later date.

--

--