Machine Vision

This is a general overview of machine vision.  To start with in the most basic sense, we as humans do not see the same a machine.  So what do we as humans see and what do machines see?  The first thing I think of when posed with this question is, "Humans see objects, machines do not"  

This is very true.  We now have cameras which have more capabilities that our eyes.  There are cameras with 360 degrees of view, or 22000 frames per second, or gigapixel cameras with telephoto lenses.  All this represents a huge amount of accurate data from the camera, but without meaning.  When we see a single digital picture, we can extrapalate a large amount of information.  We see "objects" which have "meaning".  The machine sees a large array of  arbitrary numbers.

The part which is so difficult is getting machines to see objects.







  • Haar detection - is the process of detecting objects with haar cascades.  Haar cascades are sets of data which represent the "distilled essence" of what an object looks like.  It is implemented in OpenCV (open source).  It is extremely common in face detection.  The data is usually in xml files and the results of long and intensive training.
  • Haar training - the activity of creating Haar cascades.  Training with many pictures and capturing that training in a series of xml files.  OpenCV's haar training is at the moment a difficult  and very time consuming manual activity. Almost all vision detection I have seen, has been with data that was accumulated before the detection runs.  "Predator", I believe is one of the few which do learning at the same time as detecting (way of the future).  I am working on an implementation of this in MyRoboLab.
  • Lucas Kanade Optical Tracking (LKOptical Track) - Is tracking by means of matching the surrounding pixels of a point through multiple frames in memory.  Its very effective and low on cpu power.  It does not "detect" it only tracks - A version of this is implemented in OpenCV and works well.  Once a track is lost its lost for good.  I can tell that Predator uses this to initialize the area to do training (this is the red dot).  When he loses the track, his program (I suspect) switches to detection.
  • Template Matching - is fairly straight forward.  A small image is used as a template and moved along a larger image until the closest match is found.  Takes more processing than LKOptical Track, and less than Haar detection.  Unlike Haar, it is prone to error on scale and lighting changes.  This is the least familiar of the techniques commonly used in optical detection and tracking, since I just recently got it working in MRL. 
  • Foreground Background Motion Detector - I have used this to set a LKOptical track.  Its a method I found fairly successful, where something moves, it finds the center of the movement and sets a LKOptical track to track the "moving" object.  Motion detection is good to initially set a point, but you cant use it to track since the entire background is in motion when the robot or platorm begins to move. (OpenCV)