HAND GESTURE RECOGNITION: A LITERATURE REVIEW

ISSUES TO HAND GESTURE RECOGNITION

Extraction method and pre-processing

Segmentation process is the first process for the recognition of hand gestures. It is the process of dividing the input image (in this case hand gesture image) into regions separated by boundaries. The segmentation process depends on the kind of gesture provided, if it is dynamic gesture then the hand gesture needs to be located and tracked, if it is a static gesture (posture) the input image has to be segmented only. The common helpful cue used for separating the hand is the skin colour , since it is easy and invariant to scale, translational, and rotational changes. Different tools and methods are used to differentiate skin and non-skin pixels to model the hand. These techniques are parametric and non-parametric techniques, Gaussian Model (GM) and Gaussian Mixture Model (GMM) are parametric techniques, and histogram based techniques are non- parametric.

Features Extraction

Good segmentation process leads to perfect features extraction process and in turn help in a successful recognition process. Features vector of the segmented image can be extracted in different ways according to particular application. Various methods that have been applied for representing the features can be extracted. Some methods used the shape of the hand such as hand contour and silhouette [6] while others utilized fingertips position, palm centre, etc.

Gestures Classification

After modelling and analysing the input hand image, gesture classification method is used to recognize the gesture. Recognition process is carried forward with the proper selection of features, parameters and appropriate classification algorithm. For example, edge detection cannot be used for gesture recognition since multiple hand postures are generated and could produce misclassification. Euclidean distance metric method is used to differentiate the gestures. Many Statistical tools can be used for gesture classification like HMM tool, Finite State Machine (FSM), Learning Vector Quantization, and Principal Component Analysis (PCA). Neural network has been widely applied in the field of extracting the hand shape, and for hand gesture recognition. Other soft computing tools are effective in this field as well, such as Fuzzy C- Means clustering (FCM), and Genetic Algorithms (GAs).

APPLICATION AREAS OF HAND GESTURES SYSTEM

Hand gestures recognition system has been applied for different applications on different domains, including; sign language translation, virtual environments, smart surveillance, robot control, medical systems etc. Some hand gesture application areas are: –

1. Sign Language Recognition

2. Robot Control

3. Graphic Editor Control

4. Virtual Environments (VEs)

5. Numbers Recognition

6. Television Control

7. 3D Modelling

DRAWBACKS

Orientation histogram method have some problems such as; similar gestures might have different orientation histograms and different gestures could have similar orientation histograms, besides that, the proposed method will work well for objects that dominate the image even if they are not the hand gesture. Neural Network classifier, applied for gestures classification, is time consuming and when the number of training data increases, the time needed for classification also increases.

CONCLUSIONS

In this paper, different methods for gesture recognition are discussed, like Neural Network, HMM, fuzzy c-means clustering, other than using orientation histogram for features representation. For dynamic gestures, HMM tools are perfect and have shown its efficiency especially for robot control. NNs are used as classifier and for capturing shape of gestures. For features extraction, some methods and algorithms are required even to capture the shape of the hand. The selection of specific algorithm for recognition depends on the application needed. In this work application areas for the gestures system are explained.

Using Convolutional Neural Networks

for Image Recognition

By Samer Hijazi, Rishi Kumar, and Chris Rowen, IP Group, Cadence

They proposed CNN algorithm for hand gesture recognition. By stacking multiple and completely different layers during a CNN, advanced architectures square measure engineered for classification issues. Four kinds of layers square measure most common: convolution layers, pooling/subsampling layers, non-linear layers, and fully connected layers.

The convolution operation extracts completely different options of the input. the primary convolution layer extracts low-level features like edges, lines, and corners. Higher-level layers extract higher-level options. Figure half-dozen illustrates the process of 3D convolution employed in CNNs. The input is of size N x N x D and is convolved with H kernels, each of size k x k x D on an individual basis. Convolution of associate input with one kernel produces one output feature, and with H kernels independently produces H options. ranging from top-left corner of the input, every kernel is moved from left to right, one component at a time. Once the top-right corner is reached, the kernel is moved one component in a very downward direction, and once more the kernel is moved from left to right, one component at a time. This method is continual till

the kernel reaches the bottom-right corner. For the case once N = thirty two and k = five , there area unit twenty eight distinctive positions from left to right and twenty eight distinctive positions from prime to bottom that the kernel will take. such as these positions, every feature within the output can contain 28×28 (i.e., (N-k+1) x (N-k+1)) components. for every position of the kernel in a very window method, k x k x D components of input and k x k x D components of kernel area unit component-by element multiplied and accumulated. thus to form one component of 1 output feature, k x k x D multiply-accumulate operations area unit needed.

POOLING:

The pooling/subsampling layer diminishes the goals of the highlights. It makes the highlights hearty against noise or distortion. There are two different ways to do pooling: max pooling and normal pooling. In the two cases, the info is isolated into non-covering two-dimensional spaces. For instance, in Figure 4, layer 2 is the pooling layer. Each input include is 28×28 and is isolated into 14×14 areas of size 2×2. For normal pooling, the normal of the four values in the district are figured. For max pooling, the most extreme estimation of the four qualities is chosen. . The information is of size 4×4. For 2×2 subsampling, a 4×4 picture is isolated into four non-covering grids of size 2×2. On account of max pooling, the most extreme estimation of the four values in the 2×2 lattice is the yield. In the event of normal pooling, the normal of the four qualities is the yield. If it’s not too much trouble take note of that for the yield with file (2,2), the aftereffect of averaging is a small amount of that has been adjusted to closest whole number.

Essay: HAND GESTURE RECOGNITION: A LITERATURE REVIEW

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: