Position Tracking, Creating a room map

I have a cart for my InMoov and wanted to add a bit of autonomy to it. I already have a cart control app and I am able to identify an Aruco marker and extract its distance and orientation.

My current steps are now moving the cart to a position in front of the marker and I would like to keep track of the distance travelled and possibly - with the help of the kinect camera - build kind of a map of my room..

My cart has a bno055 mounted on it so I can get its current direction and theoretically the accelerations. I do not have an odometer or wheel rotation sensor. Looked around for examples how to use the bno055 to do some kind of odometry but was not successful.

Adding sensors to track wheel rotations could help but due to fact that mecanum wheels include a lot of sliding I am a bit uncertain if it is worth the rather big effort to add that.

Thought that with a sensor similar to the ones used in optical mouses but with at least 1 cm airgap would be perfect but was also not able to find such a device.

Anybody able to point me to the right places to accomplish my goals?

kyle.clinton's picture

Indoor "gps"


I have tried a couple things for indoor localization. One came from my Rovio robot webcams. They put a set of infrared dots on the ceiling and then the bit can navigate with those.

The other path I gave started down is the use of a pixy cam and an arrangements of colored squares on the device you want to track. It would be similar to this blog on youtube by James Bruton https://youtu.be/POG1p8Y_MgA

juerg's picture

Thanks Kyle Uhh, long video,

Thanks Kyle

Uhh, long video, so adding an observation cam could be a solution but would restrict me to my own rooms or would make it rather difficult to be set up in another environment. I could mount a fish eye cam at the top of my room (it has a high ceiling as it is an under the roof space) and have a buch of LED's marking Marvin's head.

I could also add a number of Aruco tags on my walls and scan these from time to time - but I found even trying to locate a single marker comes with a price - scan 360 degrees with enough stop time to get an unblurred image for each 10 to 15 degrees step, finding a better observation point in case of nothing found, avoiding obstacles while rotating or moving etc.

Maybe yolo v2 would be a more elegant solution making it detect windows, doors, furniture, stairs? I would still need to find a good way to represent my floor map. Thought I would be able to find python libraries that bring me closer to a working solution with proven map representation architectures and a bunch of useful functions.

Read that optical mouses use a dedicated dsp to translate the pics to x-y movements. My mouse at least gives up positioning with even minimal distance to surface so I would need different optics - the positioning would however probably be rather accurate and could inherently handle the different possible moves of my mecanum cart.

A appreciate your input, will update this thread when I am able to make some progress

GroG's picture

I plan to work on WORK-E's

I plan to work on WORK-E's autonomous navigation, but currently I'm tied up in the Pixie version of mrl with its new maven build & dependency management. 


There are OpenCV filter strategies... One is "Floor Finder" - this is the concept of taking small sample of the floor in front of the robot - use the sample as a template for matching and draw a contour around.  The contour becomes the "safe" area for which the Robot can proceed.  This is not really mapping, but a relative strategy to get around obstacles.

RobotRealms has a pretty good example & explanation

OpenCV can also do Stereo related depth perception which can do a form of obstacle avoidance

I've experminted with this some  in the past :
Here's a stackoverflow outline of types of Stereo depth used for navigation 


  1. Ground plane approaches determine the ground plane from the stereo data and assume that all points that are not on the plane are obstacles. If you assume that the ground is the dominant plane in the image, then you may be able to find it simply a plane to the reconstructed point cloud using a robust model-fitting algorithm (such as RANSAC).

  2. Disparity map approaches skip converting the stereo output to a point cloud. The most popular algorithms I've seen are called v-disparity and uv-disparity. Both look for the same attributes in the disparity map, but uv-disparity can detect some types of obstacles that v-disparity alone cannot.

  3. Point cloud approaches project the disparity map into a three-dimensional point cloud and process those points. One example is "inverted cone algorithm" that uses a minimum obstacle height, maximum obstacle height, and maximum ground inclination to detect obstacles on arbitrary, non-flat, terrain.

Most important is (as the post says) Ground Plane approaches .. my experimentation wa more in the Disparity map approaches.

There are more approaches, but OpenCV or computer vision in general is more challenging than what can be easily accomplished with "active" vs "passive" sensors.   


The best active sensor in my opinion would be the Kinect.  As it gives you a quick and very detailed pointcloud.  You can process the ground plane from the Kinect and even mesh the point cloud if you want navigation information above the ground plane.

Sensor Fusion

UltraSonic ping sensors, Infrared distance sensors, Lidar, Whiskers, Bumpers, Encoders - can all give some data.  If properly filtered, and processed with other forms of input can potentially provided accurate localization information.

You mentioned an optical mouse.  I have done this with a smaller robot and it worked reasonably well.  The mouse actually was in physical contact with the floor, so it was being used as the manufaturer expected.  If you move the mouse away from the floor it does not work properly, based on its optics.

If it was me,  (I'd be interested in trying this with WALL-E) I would take a regular webcam and aim it down at the floor.  I would then use OpenCV in mrl with an LKOptical Track filter.  This is the one everyone uses for tracking non faces.   It has the capability of setting "many" dots to track. After these spots are set, they are tracked, and the relative disparity of the positions represent the amount of movement and direction of your bot.   Potentially, aiming the camera directly down makes the math easier ;) .. but you could extrapolate the details of movement just by experimenting if it wasn't aiming directly 90 degrees to the floor.

Sounds like fun :)

GroG's picture

This is some impressive

This is some impressive demonstration of what a Kinect and some clever developers can do :

There is a site that has a group of SLAM algorithms from an "open" competition..

One of which RGBDSLAM uses "a Kinect style camera"

I have meant to port this code to mrl ... just havent had the time, but WORK-E has a kinect mounted to it, so I'm getting closer ;)

juerg's picture

looks like a whole bunch of

looks like a whole bunch of things to look at. thank for all the links and your insight!