Above is a diagram of a "dumb" robot in a room with an overhead webcam. The computer is doing obstacle detection, object identification, speech recognition and other higher processing tasks. The computer can direct the robot through some wireless communication such as RF, Infrared, Bluetooth or other means. The robot itself only has a small, inexpensive microcontroller (Arduino, Prop, Basic Stamp, etc) This allows the robot to act as an extension of the desktop computer.
Although many hobbyists have created robots out of laptops and even desktop computers, the average person would probably not want to dedicate their powerful "house" computer to drive a robot around. Even if the robot did utilitarian things like sweep, mop, or put things away, they would not want to sacrifice their computer. With appropriate software and wireless communication, the desktop can be "shared". While the robot is busy exploring, cleaning, mapping or some other task a person can still use the desktop for daily activities.
Webcam - one of the most cost effective sensors to date. I believe this is because people are very visually oriented, we demand and expect alot from this sense. And we demand and expect alot from visual displays or sensors. So the technology of this little guy is amazing, and furthermore the price is $38 dollars for this electronic eye.
Experiment - take a picture and give it to someone else.. ask what they see.. the saying a picture is worth 1000 words has meaning. Then of course a video stream would be 1000 * 30 fps = 30,000 words per second :)
Although the sensor is amazing, something is lacking from a machine being able to understand a picture. To a machine its just a bunch of 1's and 0's. Like our rods and cones in our eyes firing synapses, the 0's and 1's of a picture or video screen are meaningless without a brain or visual cortex to process them into meaningful things. Most micro-controllers are missing the necessary software, or sometimes processing capabilities to make sense of all that info. But regular PCs are now starting to have the capability of tackling some of these problems. The hardware is there, what is lacking is the soft "visual cortext" to derive meaning. An open source project called OpenCV is one of the more popular computer vision software packages which implements some rudimentary functions for the visual cortex.
Speech Recognition
Another good open source speech recognition package is Sphinx 4, which will run on a regular PC.
Text To Speech
FreeTTS is an open source text to speech engine. AT&T has a nice one called naturally speaking, but it is restricted to non-commercial use.
All of these software packages are great, but there needs to be a way to glue them all together, in a way which is not too messy. Here is where I'll bring up the analogy of Legos. They are uniform and have well defined interfaces. A lego by itself is pretty boring. Some are different colors, and some are a little special, in that they provide for a wheel, or hindge, or some other simple functionality. But the strength is in the ease of which you can put them together, to build more elaborate, and exciting projects. MyRobotLab works by putting a lego exterior over these different software packages. And it allows them to interconnect more easily.