I am working on a social robot to wander around the house, find people, and annoy them.
Some of its features are:
- Autonomous wandering, looking for humans
- Face recognition and face tracking
- Recognising people by name
- Following people by tracking their faces
- Speech synthesis
- Speech recognition
- Home automation
- Google Now integration
I am using this base, which is available from various suppliers, and comes with two motors.
The base is powered by 2 86250 3.7v rechargeable Lithium Ion batteries. Battery packs for these are available cheaply on ebay and elsewhere.
I reprogrammed the firmware in the ESP8266 using the Arduino IDE. My version connects to my Wifi access point and uses MQTT messages to drive the robot, get sensor values etc.
I added an HC-SR04 ping sensor to stop the robot bumping into things. As the ESP8266 has 3.3v logic, I used an HC-SR04 that worked with 3.3v. They are slightly rarer and more expensive than the 5v only ones.
I have used a couple of different camera modules, including one based on the OpenMV. The one in the picture is using an Android phone, on a pan-and-tilt phone holder. Using an Android phone rather than the OpenMV allows me to provide a lot more features.
I am currently just using the tilt function with a single HS-422 servo. The HC-SR04 ultrasonic sensor and the servo are driven by the ESP8266 motor shield.
The Android App that I have written uses:
- JavaCV for face tracking and recognition
- Google Text to Speech for speech output
- Google Speech Recognition for speech input
- Eclipse Paho MQTT for communication with the base module and other services
I am using JavaCV rather than the OpenCV java interface, as JavaCV seems to be simpler and more complete for this application. (I tried both and a combination of them).
On the phone display, I either show a robot face or a camera preview from the front camera. The app recognises faces and sends MQTT commands to the base to keep the face in the middle of the screen. If the face it too small, the robot moves forward; if it is too big it moves back. If it is on the left of the screen, the robot turns right; if on the right, it turns left. It is is high on the screen, the camera pans up; if low, it pans down. In this way, the robot follows and tracks the person.
When the robot first recognises a face, it asks the person for their name using Google speech recognition. It then goes into a short training session. When it has enough data, it attempts to predict who a new person is. This works reasonably well, but is a bit dependent on lighting. Google speech recognition is a little slow.
If you touch the phone screen, you get a small pop-up menu of options. One option is to switch between face and camera mode. The others are different varieties of speech commands:
- Phone commands
- Robot commands
- Home automation commands
The phone commands are the Google Now ones that you get by touching the microphone on Android phones. This includes opening apps, asking questions, setting reminders, playing music, etc.
Robots command are ones I have implemented and include driving the robot, and managing the face recognition data. For examples, you can list the recognised people, delete and rename people. The data is kept in an external directory in the phone memory.
Home automation commands are sent to my home automation system via MQTT. So I can switch the lights, television, heating etc. on and off.
I might add Alexa commands to this.
I think my social robot can do most of the things that the commercial versions can do, but it is not done quite as smoothly as the videos for those products suggest they work.
I could quite easily add extra function like taking pictures or videos and uploading them to Facebook or Youtube.
It doesn’t have the story telling capability or game playing or education apps or recipe following of the commercial social robots, but this could be done by integrating with other apps.
The Aido has an optional video projector, which is a nice feature, but expensive.
I need to add a cliff sensor to stop the robot falling down stairs. A few more sonar sensors would help too.
It would be good to add navigation capability but that would need Lidar or a 3D camera, or perhaps a Google Tango phone.
Continuous listening for a trigger word is also possible. It currently only listens when it asks for the name of an unrecognised person, or when you touch the screen and select a speech command type.
Some animation of the face like blinking, eye movement and moving of lips when talking, would be good.
Other things I could add include motion tracking, emotion detection, object recognition, and telepresence.
I will describe the ESP8266 and Android software in separate posts.