Object Detection and Path Finding Using Monocular Vision

Abstract ‘ This project consists of a prototype of an autonomous robot that picks up the desired object (red can) solely based on camera vision. A robotic clamp and a camera are mounted on it. All the information is transferred wirelessly up to distances of 100 ft. The processing of the image is done on an external computer using software like OpenCV, Python and Microsoft Visual Studio. Using samples and regression analysis, the distance of any pixel and the width of any object can be found. After obstacle detection, a suitable path is chosen. All movement is controlled by PIC microcontroller with the help of RF transmitter-receiver modules. It is best suited for non-textured, flat surfaces with little or no movement in the foreground.
Key words ‘ Image processing, Monocular Vision, Autonomous robot, Path Finding, Compute Vision
Introduction
Autonomous mobile robots need techniques to guide them through the various motionless and moving objects that are present in outdoor environments. A number of algorithms have been proposed based on the various types of sensor technologies such as infrared sensing, sonar etc. The inaccurate data and retrieval of information using these sensors is one of the critical reasons to explore other domains for robot navigation and functioning.
The use of computer vision was not explored thoroughly due to expensive cameras, the lack of powerful computers and inadequate understanding of human vision. This project aims to use vision alone as a sensor for autonomous mobile robots to handle real life tasks.
Vision is vastly complicated and immensely useful sensory device for humans. We are capable of viewing objects in any kind of light, at most distance and any kind of orientation. We have the ability to notice small details which help in facial recognition, reading texts etc. Vision is the primary sensor for path finding, location mapping, object detection, and collision avoidance.
The goal of computer vision is to help robots replicate human vision and surpass it.
Assumptions
The following assumptions were taken into considerations during this project. They are as follows:
There is only one object in the frame.
The object is a red can.
There are no moving obstacles in the image.
The background does not merge with the colour of the object.
The can should be at least 50% visible in the image.
The ground is flat and evenly-textured.
The object is in a standing position.
The image size used is 2048X1152 pixels.
Definitions
The following terms will be used in the paper and have been defined.
Hue Distance ‘ Hue is defined as the degree to which a stimulus can be described as similar to or different from stimuli that are described as red, green, blue, and yellow. Hue distance returns an image representing the distance of each pixel from the given hue of a specific colour. The hue is ‘wrapped’ at 180, so we have to take the shorter of the distances between them ‘ this gives a hue distance of max 90, which we’ll scale into a 0-255 grayscale image.
Blobs – Blobs, also called objects or connected components, are regions of similar pixels in an image. Blobs are valuable in machine vision because many things can be described as an area of a certain colour or shade in contrast to a background.
Background – The term background in computer vision can mean any part of the image that is not the object of interest. It may not necessarily describe what is closer or more distant to the camera.
RS232 [2] ‘ In telecommunications, RS-232 is a standard for serial communication transmission of data. It formally defined the signals connecting between a DTE (data terminal equipment) such as a computer terminal, and a DCE (data circuit-terminating equipment) such as a modem.
Overview
The project is divided into two parts:
Software – It includes the processing of the image, controlling the robot, and transmitting information wirelessly.
Hardware ‘ It includes the camera, camera mount, robotic clamp, and movement mechanism.
In this paper, we will mainly focus on the software aspect of the project.
Fig1: Algorithm showing the basic steps
Fig 2: Block diagram showing basic robot design
Algorithm
The algorithm is comprised of three basic subdivisions, namely, image processing, data transmission, and instruction decoding for robot movement. SimpleCV handles the image processing, Visual Basic for data transmission and Embedded C for instruction decoding.
SimpleCV [1]
The camera, a smartphone, sends an image to the laptop via an app called IP Webcam. Using SimpleCV, this image is saved for further processing. A text file is opened which will store all the instructions for the robot. The image is then cleaned and the Hue distance, for the image, from the colour red is calculated. The new image is converted to a black and white image using a certain threshold. This means that all the red objects become white and all other objects become black. Blobs, of a certain size, are located and sorted according to their distance from the colour white. This is done so that the red objects are analysed first. Blobs which are rectangular and of a certain size are located and tagged. These blobs are the objects of interest. If nothing is found, we exit the program. Next the position, pixel width, actual width, pixel height, distance from the robot and the degree offset from the center of the object are calculated. The equation used to calculate width and distance for the object are as follows:
Distance from the robot (in 10-2 m) =
747+((Pixel Distance-516)*(3.5714))+((Pixel Distance-516)*(Pixel Distance-341)*(0.0131))+((Pixel Distance-516)*(Pixel Distance-341)*(Pixel Distance-266)*(0.000072147))+((Pixel Distance-516)*(Pixel Distance-341)*(Pixel Distance-266)*(Pixel Distance-387)*(0.00000087017))
Fig 3: Graph of Distance Pixel vs. Actual Distance
Fig 4: Actual Width as a function of Pixel width and Actual distance
Width of an object (in 10-2 m) =
(0.0007490827925006831*’actual distance from robot’^2) + (0.00011737831030326815*’Pixel width’^2) + (0.0011054886895917884*actual distance from robot*Pixel width) – (0.6916574082810046*actual distance from robot) – (0.3949146181777995*Pixel width) + 160.42288828223246
The above equations were obtained by using Mathematical Regression and Sampling.
The degree offset from the center is written to the file in the format ‘C (degrees)’. Degrees are calculated using the formula:
Degree offset =
(Pixels from the center * Angle span of the camera)/((Total pixel width of the image)/2)
Using the position of the object in the image, a new frame is created for the image. It is made such that it can accommodate three robots side by side. This is to allow space for tank turning. Using this new frame, obstacles are found of a certain size and are sorted according to their distance from the robot. For each obstacle, its distance, width, relative position, and amount of horizontal and vertical shift needed in the frame to reach the obstacle is calculated and subjected to the following rules:
If it is cut by the right side of the frame then the robot will go left to avoid it
If it is cut by the left side of the frame then the robot will go right to avoid it
If it is the center of the frame then the robot will go right to avoid it
This information is written in the file in the format ‘F (distance)’ or ‘L (degrees to turn)’ or ‘R (degrees to turn)’. After traversing the entire path, using the object distance and its position, the remaining horizontal and vertical traversal needed to reach the object is calculated and written to the file in the above format. The last instruction written to the file is ‘K 1’ which tells the robot to pick up the object using the clamp.
Visual Basic [3]
Visual Basic is used to control the overall functioning of the program. It controls when to run a script and the data to be sent to the robot. First an RS232 port, to communicate with the first Microcontroller, is opened and a text file, for the instructions, is created. A loop is started which terminates once the instructions for the object to be picked up has been sent. Inside the loop the SimpleCV script runs and the text file is read. Each Letter in the beginning of each sentence is treated according to the following cases:
F ‘ The distance is converted into time and then it is sent via RS232 in the form F(time)$
L/R ‘ The degrees are converted into time and it is sent via RS232 in the above format
K ‘ It is sent as it is in the format K1$
C ‘ The angle is converted into time and it is sent in the above format
N ‘ Checks whether no object or no obstacle has been found
If no object then the next iteration of the loop is initiated
PIC ‘ Embedded C [4]
Embedded C is used to receive and send the instruction to the robot from the computer. It is also used to control the movement of the robot. The instructions are received from Visual basic one set at time (For example ‘F15$’). This is broken up into individual characters and then converted to binary according to the table below (Table 1). The converted binary code is then sent to the other Microcontroller on the robot via an RF transmitter receiver pair. The second microcontroller decodes these instructions using the ‘$’ character as a terminating symbol. The instructions are then put into a queue in the form of an array. Once these instructions have been executed then the order of instructions is reversed so that the robot reaches the original location and drops the object.
Character/ Instruction Binary equivalent
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
F 1010
L 1011
R 1100
C 1101
K 1110
$ 1111
Table 1: Character to binary conversion
Results
The algorithm discussed above has been tested using three different images from an outdoor environment. The results of the testing have been shown below.
Object found
C 2
F 115
R 90
F 11
L 90
F 56
R 90
F 21
L 90
F 69
L 90
F 32
K 1
Fig 5: Top ‘ Original photo. Bottom Left ‘ Locating the object.
Bottom right ‘ Locating Obstacles
Object found
C 4
F 151
K 1
Fig 6: Top ‘ Original photo. Bottom Left ‘ Locating the object.
Bottom right ‘ Locating Obstacles
Object found
C 1
F 90
L 90
F 27
R 90
F 356
R 90
F 27
K 1
Fig 7: Top ‘ Original photo. Bottom Left ‘ Locating the object.
Bottom right ‘ Locating Obstacles
Processing Time
The total processing time i.e. time for the whole process to complete once and feed the robot with the path directions can be divided into four parts,
T1 = time taken for camera to wirelessly transfer image over Wi-Fi to the processing unit;
T2 = time taken by SimpleCV to process it;
T3 = time taken by Visual Studio to serially transfer it via RS232 to the microcontroller 1
T4 = time taken by serial RF transmission
Here, it is only relevant to compute t2, as rest are more hardware dependent and can be modified.
For Fig. 5, the computational time 10.066 seconds.
For Fig. 6, the computational time 8.639 seconds.
For Fig. 7, the computational time 10.029 seconds.
After running on a number of samples, the average computational time was found to be 9.578 seconds.
Accuracy
Object Finding
The routine has a number of checks before it locates the desired object (red can) and is reasonably accurate provided the object size is within the given range. Here, the desired object size range is 15X5 10-2 m. The algorithm can locate any red object qualifying the size criteria since detection is based on colour blobs.
Path finding
The algorithm is as accurate as the function. Since the regression function derived from the samples is only a best fit graph within a given range, the calculated values may differ minutely from the actual values. Sampling has been done considering the minimum and maximum distances and widths covered by the camera frame at a time. Hence, distance to anything within the camera frame can be generated using the interpolation function derived.
Conclusion
This project showcases the wide possibilities for the use of vision in autonomous robots. Using only one camera, we effectively located and mapped a path to an object. With the ever expanding research into various cameras, the possibilities for improvement of such a sensor is very high. With advancements in stereoscopic vision, better and more accurate path finding algorithms can be determined using depth maps.
References
[1] K. Demaagd, A. Oliver et al, Practical Computer Vision with SimpleCV, Sebastopol, CA: O’Reilly Media, Inc. 2012.
[2]J. Axelson, Serial Port Complete, Programming and Circuit for RS-232 and RS-485 Links and Networks, Madison, WI: Lakeview Research 2000.
[3]M. Snell and L. Pawers, Microsoft Visual Studio 2010 Unleashed, Upper Saddle River, NJ: Pearson Education, Inc. 2010.
[4]R. Barnett, L. O’Cull, S. Cox, Embedded C Programming with the Microchip PIC, Clifton Park, NY: Delmar Learning 2004.

Essay: Object Detection and Path Finding Using Monocular Vision

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: