Index

  1. Overview
  2. Hardware of Home Service Robot
  3. Object Recognization System
  4. Experiment Results
  5. Future Work
  6. Appendix

Overview

 This thesis mainly discusses the design and implementation of object recognition and visual perception for home service robot. Nowadays, researches for robot are toward the implementation of serving people in home environment, such as ASIMO, PR2 , and ISSAC. These robots can not only interact with family members but also help people to do household affairs.

 Unlike outdoor environment, there are many items in house, such as daily supplies and foods. This is a big problem if we want the robot to supply services which is related to these items. To solve this, we construct the visual perception system and the real-time object recognition system. With these capabilities, the home-service robot is able to distinguish different objects and search for specific item. As a result, it can do further things such as throwing away the garbage or delivering the correct medicine to the master, and these functions really enhance the home-service robot.

Hardware

 All the systems mentioned in this research is implemented in a home service robot - May - which is designed by aiRobot lab. Fig. 1 shows its appearence, and fig. 2 is the hardware architecture.

Figure 1: The appearance of our home-service robot, May.
Figure 2: Hardware architecture.

 The hardware of vision system includes three partitions: computer, graphic card, and Kinect, as shown in Fig. 3. Kinect consists of a depth sensor to detect the distance between the robot and the objects in front of it. All information received by camera is processed by one computer.

Figure 3: Hardware of vision system.

Object Recognization System

 An object recognition system contains three steps: detection, description, and matching. For keypoint detection, we use CUDA SURF as the detector. First, we introduce the algorithm of SURF detector, which is more repeatable than FAST, and more efficient than SIFT. After locating the interest point, we use BRISK as our descriptor. Unlike those popular descriptor—SIFT and SURF—generating a vector of floating-points values for each keypoint, BRISK use binary string to store the feature. Calculating the Hamming distance of binary string is much faster than computing the usual Euclidean distance of floating numbers. In the end, we compare our system with other famous approaches. The flow chart of our object recognition system is shown in Fig. 4.

Figure 4: Flow chart of object recognition system.
Figure 5: Detecting time with different resolution.

 SURF detector provide us repeatability and accuracy. SIFT is also famous for its stability, but comparing to SURF, it takes more time to calculate keypoints, as the comparison shown in Fig. 5.

 Pascal Fua proposed a novel method: Binary Robust Independent Elementary Features, called BRIEF, in 2008. It has demonstrated that it is very effective to use the binary string for describing the feature of an interest point. BRIEF descriptor compares the intensity of many pair of points around a keypoint and then generates the binary string directly without the procedure of computing the intensity histogram which is a normal step in other state-of-the-art algorithms, for example, SIFT.

 However, there are still some flaws in BRIEF—unreliability on image transformation and in-plane rotation. In our system, we choose BRISK. This descriptor dramatically reduces the time on calculating the feature description. Besides, it saves the memory consumption for each keypoint.

BRISK uses a uniform pattern to sample the surrounding of one keypoint. Fig. 6 shows the sampling pattern while scale size k equal to 1. The small blue circles are the location of the chosen samples; the red dashed circles outside each blue circle with radius σ mean that the point is smoothed by Gaussian filter with the standard deviation σ. In the end, the feature description generate a binary string to represent the keypoint.

Comparing two binary strings is easy and time-saving. Table 1 illustrate the matching time for 1000 keypoints in Fig. 7 with different decriptors.

Figure 6: The BRISK sampling pattern with scale k = 1.
Table 1: Matching time for two images.
Figure 7: Matching two sets of keypoints.

 To respond to the environment quickly, the home-service robot needs to have a real-time visual system, and the above solution is not fast enough. Hence, we use CUDA SURF to reduce the calculation time of detecting the interest points. CUDA makes it possible for developers to access to the virtual instruction set in NVIDIA GPUs, and it provides physical parallelism across the broad spectrum. As the experiment result in Fig. 8, CUDA SURF dramatically reduce the calculation time.

Figure 8: Speed of CPU-SURF and CUDA SURF.

Experiment Results

 The robustness and effectiveness of the object recognition system have been proven in both the laboratory and the international competition. Our home service robot -May- can perform many task with the assistances from visual systems, such as searching an appointed drink for the master or throwing out garbage. Here are videos illustrating these tasks.

Future Work

 There are still several problems unsolved in this thesis. The proposed object recognition system is not capable of identifying the items with blank appearance. There must be some patterns as the features on them in the proposed system. To solve this, one might adopt the 3D object recognition methods. However, such methods always cost a lot of memory and are time-consuming. It needs to take a trade-off between accuracy and efficiency. Besides, CUDA SURF can only be performed on graphic card made by Nvidia. We should search for another method to complete the parallelization so that this system can be implemented on any platforms.

Appendix

  1. H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded up robust features,” in Proceedings of the European Conference on Computer Vision, pp. 404-417, 2006.
  2. S. Leutenegger, M. Chli, and R. Siegwart, “BRISK: Binary robust invariant scalable keypoints,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2548–2555, 2011.