One of the things that's difficult in face identification is eliminating noise.   In most pictures, the noise is in the background.    A simple oval mask can eliminate much of the noise that's not part of the face.   This mask assumes the faces are already centered in the frame.

GroG

8 years 1 month ago

Cool scruffy,

Victorian even ;)

Can you elaborate in the post more about the details?
For example - with OpenCV you can dynamically create a Mat, wouldn't this be more advantageous ?
What is the algorithm you are currently using ? What is the plan for the future?
Fill us in ! 

I'm working with kwatters on improved facial recognition and I needed a way to get him the filter file I'm using.  A blog posting seemed the easiest way to attach a file, so he could use it.

Ultimately, the algorithm I'm using stores a library of learned images.   Each image is "normalized" prior to saving.   In this process, I locate the primary facial features (first the face itself, then the eyes, nose and mouth within each face).   If you know where both of the eyes are, you can make a logical inference that the image is tilted or not (unless you're using a picture of Quasimodo).

If you look at the position of the mouth, relative to the eyes, you can also determine if the head is turned (skewed).    If the mouth is more under one eye than the other, then you know if the head is turned one way or the other.

If the head is skewed or tilted, facial recognition is much, more difficult, because most algorithms use "least differences" to identify when a face is most like an already learned face.  Whether you're using LDA or PCA, neither work well on faces that don't all look similar in size and features.  A standard Haar cascade can do fairly well at detecting a face that is half or double the normal size, but a face identifier can't deal with those variances at all.  In other words, if an incoming face is the wrong size, often it's virtually impossible to compare it to a saved image successfully.

Fortunately, there are ways to correct for these things.   In my code, I use an affine tranformation to normalize the images.   This handles tilt, skew and resizing in a single conversion.    In layman's terms, all you do is pick three points in an image (like the center of the two eyes and the center of the mouth).    If you think about a normal face, connecting these three points will create a triangle.     If the face is tilted, skewed, or the wrong size, the triangle will be distorted or the wrong size.

However, if you know where you EXPECT these three points to be (you want the eyes to be level with each other (and about 1/3 the way down in the picture) and you want the mouth centered between the eyes and in the lower 1/3rd of the picture), you can apply a single tranformation that remaps the entire picture from the incoming image to the normalized image.

Now, since you know that all the eyes are in exactly the same place in every picture and since you know the mouth is in the same place in all the pictrues, you can apply the mask (essentially just an oval) to mask out the background from the pictures.   You can't do this if you don't know where the face is in the picture.

I just do all these things to improve the facial recognition capability.    This is where my implementation does better than many.    Since I pick up where the eyes, nose and mouth are, I can do a rudimentary remapping to normalize them as I learn them.  

Now, the oval is used to mask out part of the image that is least likely to contain facial features.    In most algorithms, what's NOT part of the face can prevent a detection, even if the images is almost identical in all other respects.   For instance, take two pictures of someone, but turn on a light behind them between the images and the bright area that's in one picture (around the sides of the head) could prevent the face from being recognized.   The oval mask attempts to eliminate as much of the "non-face" in the picture as possible, meaning that only facial features are used in comparison.

Since I want to keep the images as "pure" as possible, when I read them back in, I apply the oval mask to each to filter out unwanted parts of each picture.   I could do this prior to saving them in the first place, but I want to keep the images intact (so I can show them once I make the identification).    I do the mask right as I'm reading them in, so a bitwise "AND" is all that's needed.

Also, assume that you have an image where the position of the eyes isn't exactly where my tranformation places them (again, the quasimodo example above).   The affine transformation will distort the image as it's learning it and may even be unrecognizable as the subject in question.   However, since the test images are also transformed before comparison, Quasimodo can still be identified as easily as anyone else.  [I've tested it on my dog and it detects her as easily as it detects me.]

Now, to answer why I didn't use OpenCV Mat functionality.  Assuming "im" is the image and "facefilter" is the oval, then "im_mask = im & facefilter" is about as simple as you can get to create the mask.    I'm fairly new to Python, so the bitwise-AND seemed to be as efficient as anything else I knew about when I was coding it.   This is why I've uploaded my code to github, I welcome those with more experience to pick apart my stuff and integrate it into MRL (as I learn how to do more on my own).     Since there was some discussion a few days ago, I figured that this might help others improve their own stuff.    In looking at the existing face recognition code and my own code, I think we've already identified opportunities for improvement in both areas.

- scruffy-bob

 

One other reason for the separate mask file...    When I was coding this, I didn't actually know if this would work, and I still don't know if there are other masks that would work better than this one.   Making it a separate file that's 'and'ed with the image dynamically, gives anyone else the opportunity to change it or eliminate it easily.

- scruffy-bob

GroG

8 years 1 month ago

Thanks Scruffy for the clear and well described strategy !  This is really very exciting, I have only delved into face recognition and a little TLD.  

It seems you've worked out a lot of great strategies in normalizing the comparison between 2 images, based on one (or many?) training images.  I'd really like to see your github code.  I'd also love to see a video example of your identification, if possible.

My small experience wih ideas on face identification went along these lines:

  • Use pre-trained Haar (Face Detection) from OpenCV, with this potentially you've identified the face and all the most prominent feature locations
  • Switch to TLD with a LK tracking point set at the appropriate location from the step above
  • TLD collects a growing set of input images which is used in current tracking and in the background used to create a more robust set of Haar cascades
  • Rinse Repeat