Leveraging Facial Recognition in your systems

Author by Lwin Maung

Security is an important aspect of our lives and we use various means of security every day. In order for us to secure items in our lives, we have to have a trust factor. This applies to our bank accounts, our passwords, and even vehicles that we utilize every day. Sometimes, security can be complex -- in the case of bank account being accessed over the internet, or a rolling security codes in your car's remote start module, or RFID and barcodes in your driver license or passport.

At the end of the day, trust is what allows someone or a network system to allows for access. When you go to the bank, you have to provide a proof of who you are in a form of ID to even have a banker access your account. Now, let's think of something simple. Let's go back in time. Imagine that you are a newborn baby. The first few things you learn to trust is the sound of your mother. Your eyes are nowhere near 20-20 vision and the only thing you are you know is that you are potentially hungry. An infant mind has to associate the sound of your parent by a learned process. As you get older, you can now see a little better. As your senses grow, you learn to use them to authenticate your trust factors. Within months, you learn to recognize faces. You learn to associate familiar faces as good (trustworthy) people while unknowns as (potentially) bad non-trustworthy people in your short life thus far.

Computers can be trained to think and act just like humans as well. Using technologies such as Microsoft's Cognitive Services, we use to duplicate the learning processes that we learned as infants. We don't have to use 256k secured keys to protect the contents of our digital lives. We can use our faces, voices, fingerprints and more to allow for physical computers to issue trust using biological traces that are unique to us humans. The best part is, just like infants, the computers can be taught to trust the good identifiable features from bad ones by reinforced training. Let's see how that is accomplished.

Let's explore face detection first. In order for us to demonstrate this, we will need an image capture device. For simple recognition, any webcams will do. If you want to make sure that a face is a true 3D object, we will need to use a Windows Hello enabled camera such as Logitech 4K webcam. Once you have a camera, the rest of the process is simple. We will capture an image (or a single frame image from a video stream), send the image to Azure Cognitive Services (Face API Service) and let the service decipher the face for us. The Face API service can perform the followings:

  • Face Detection
  • Face Recognition
  • Face verification

What does that mean?

Face detection

The Face API can detect up to 64 human faces with high-precision face location in an image. The image can be specified by file (a byte stream) or with a valid URL. The face rectangle (left, top, width, and height) indicating the face location in the image is returned along with each detected face. Optionally, face detection extracts a series of face-related attributes such as pose, gender, age, head pose, facial hair, and glasses.

Face recognition

The ability to identify human faces is important in many scenarios including security, natural user interface, image content analysis and management, mobile apps, and robotics. The Face API service provides four face recognition functions: face verification, finding similar faces, face grouping, and person identification.

Face verification

Face verification performs an authentication against two detected faces or from one detected face to one person object. Using face verification, other types of information can be derived. Given a target detected face and a set of candidate faces to search with, the service finds a small set of faces that look most similar to the target face. Two working modes, matchFace and matchPerson are supported. matchPerson mode returns similar faces after applying a same-person threshold derived from Face - VerifymatchFace mode ignores the same-person threshold and returns top similar candidate faces. In the following example, candidate faces are listed.



What does the result look like?

Face detection would send successfully detected face as a JSON object. A sample is shown below:

[ { "faceId": "f7eda569-4603-44b4-8add-cd73c6dec644", "faceRectangle": { "top": 131, "left": 177, "width": 162, "height": 162 }, "faceAttributes": { "smile": 0.0, "headPose": { "pitch": 0.0, "roll": 0.1, "yaw": -32.9 }, "gender": "female", "age": 22.9, "facialHair": { "moustache": 0.0, "beard": 0.0, "sideburns": 0.0 }, "glasses": "NoGlasses", "emotion": { "anger": 0.0, "contempt": 0.0, "disgust": 0.0, "fear": 0.0, "happiness": 0.0, "neutral": 0.986, "sadness": 0.009, "surprise": 0.005 }, "blur": { "blurLevel": "low", "value": 0.06 }, "exposure": { "exposureLevel": "goodExposure", "value": 0.67 }, "noise": { "noiseLevel": "low", "value": 0.0 }, "makeup": { "eyeMakeup": true, "lipMakeup": true }, "accessories": [ ], "occlusion": { "foreheadOccluded": false, "eyeOccluded": false, "mouthOccluded": false }, "hair": { "bald": 0.0, "invisible": false, "hairColor": [ { "color": "brown", "confidence": 1.0 }, { "color": "black", "confidence": 0.87 }, { "color": "other", "confidence": 0.51 }, { "color": "blond", "confidence": 0.08 }, { "color": "red", "confidence": 0.08 }, { "color": "gray", "confidence": 0.02 } ] } } } ]

What is notable is that we can see the types of information being detected in an image (or a frame of a video). Azure Face API can detect a face and determine if a person is smiling, sex of a person, age, facial hair and head composition, emotions such as happiness, sadness, disgust, and even if a face is occluded by various means such as eyeglasses, hair, or makeup.


In order for us to verify faces, we need to identify and teach the API to associate detected faces with a person by training a model that holds the face detection. Each and every time, we detect a face, we can use the PersonGroup api call and add and train the person or person to a list of recognized people for verification purposes.


TrainingStatus trainingStatus = null; 

  trainingStatus = await faceServiceClient.GetPersonGroupTrainingStatusAsync(personGroupId);
  if (trainingStatus.Status != Status.Running) 

  await Task.Delay(1000); 


You add a user by adding this to code block:
await faceServiceClient.AddPersonFaceAsync( personGroupId, personX.PersonId, File.OpenRead(imagePath));


Once the user is trained, we can then use https://[location].api.cognitive.microsoft.com/face/v1.0/identify call to identify the person on the verify list. If the face or faces are identified on an image, a result similar to the JSON object below will be returned to you so that you can verify the person.

[ { "faceId": "c5c24a82-6845-4031-9d5d-978df9175426", "candidates": [ { "personId": "25985303-c537-4467-b41d-bdb45cd95ca1", "confidence": 0.92 } ] }, { "faceId": "65d083d4-9447-47d1-af30-b626144bf0fb", "candidates": [ { "personId": "2ae4935b-9659-44c3-977f-61fac20d0538", "confidence": 0.89 } ] } ]

Next steps

We can consume the data at this point and keep training the model for additional accuracy. What we have to realize is that just like a child, we started in "Tabula Rasa" state and we will have to fill in that blank slate with data over time for better analysis over time. Next, we will discuss the Windows Hello and using 3D face matching with better accuracy and more.