1.1 Background to the Study

A character is the basic building block of any language that is used to build different structure of the language (Iorundu and Esiefarinrhe, 2015). Characters are the alphabets and the structures are the words, strings and sentences (Lecun et al., 1990), (Kader & Deb, 2012). Recognizing characters with diacritical (accent) marks always requires optimization methods for maximal recognition rate and minimal error rate. Optimized back propagation algorithms are essential to achieve optimal recognition rate for handwritten Yoruba characters.

Character recognition as an area in the field of pattern recognition has been an interesting and intriguing area of research over the last couple of decades. It has been one of the most important and researchable areas in the fields of artificial intelligence and machine learning most especially in the application of artificial neural network.

In fact, character recognition has found its usefulness in several application areas such as reading of bank cheques, deciphering zip codes, document analysis and retrieval.

In character recognition, printed documents are transformed into ASCII files for the purpose of editing, compact storage, fast retrieval through the computer. According to Mahmood et al. (2012), Character recognition can be divided into two: (1) online character recognition in which text is automatically converted as it is written on digitizer such as PC tablets, where a sensor picks up the pen velocity as characters are scripted. The signals obtained are transformed into a letter code which is usable to computer and text processing applications. (2) Offline character in which handwritten characters are scanned in form of paper document, processed and converted to binary or grayscale to make available to a recognition system. However, most of the research efforts have been focused on the recognition of English and Latin characters; little work has been done on African characters.

Back propagation is one of the most widely used learning algorithms to solve many classification problems by using the concept of multilayer perceptron (MLP) training and testing. However, it comes with its shortcomings which are slow convergence and getting trapped in the local minima. But many solutions have been proposed by many neural network researchers to overcome the problem of slow convergence and hill climb the local minima to reach the optimal solution (global minimum).

In this view, many powerful numerical optimization algorithms have been devised, most of which have been based on gradient descent algorithms as was explained by Shashank and Tripatti (2013) such as conjugate gradient algorithms, Quasi-Newton and Heuristic Resilient propagation, Levenberg-Marquardt etc.

Meanwhile, there is need to perform an experimental evaluation of some of the most promising optimization algorithms to decide which training algorithm is the best in practice for the research.

The main goal of the research is to evaluate the performance of different optimized back propagation algorithms for recognition of Yoruba Handwritten characters. Like four types of the training algorithms namely; Scaled Conjugate Gradient, Resilient Propagation, Quasi-Newton BFGS and Levenberg-Marquardt were investigated. The performance was evaluated based on Mean Square Error (MSE), Epochs (Iterations), Speed (time) and Accuracy.

1.2 Problem Statement

The presence of accent marks or diacritics on some Yoruba characters had posed more difficulty in its recognition process than its English counterpart and some Asian languages. Though, Yoruba orthography is a variant of the English alphabetic system, its tonal marks (diacritics) introduce additional difficulty in its recognition task, if the diacritical marks are ignored especially in handwriting. Removal of any of these marks will lead to misinterpretation of the character or gives different meaning when combined as text. Meanwhile, researchers have taken bold steps to address this issue and various publications have been submitted based on different methodologies they used in tackling the problem of Yoruba character recognition, most especially handwritten character recognition.

Thus, there is a need to independently evaluate the various methods to determine the best and also identify which features of the methods enhance better performance.

1.3 Aim and Objectives

The aim of the research is to evaluate the performance of optimized back propagation algorithms for recognizing Yoruba characters. The aim will be achieved through the following objectives to:

extract the features that can enhance the recognition accuracy of Yoruba characters.

train Neural Network with Scaled Conjugate Gradient, Resilient Propagation, Quasi-Newton BFGS and Levenberg-Marquardt algorithms.

measure the performance of each algorithm on their recognition capability.

select the appropriate training algorithm for classification.

1.4 Significance of the Study

Yoruba is of the tonal languages for which its meanings are determined by combining appropriate number of consonants and vowels with the tonal marks. The essence of representing tones in written form presents an orthographic challenge particularly for the computer. This is as a result of inadequate foresight when Yoruba orthography was developed. But today we are faced with the reality. Even though, Yoruba orthography is based on Latin script (which had enjoyed appreciative attention in computing), the need to indicate tones by the application of diacritical signs in position that are not normally supported by popular computing platforms present a fundamental problem for Yoruba literature in the digital age. Nonetheless, researchers have been working tirelessly in order to bring Yoruba language into forefront in the field of pattern recognition.

1.5 Scope

The research covers performance evaluation of four training algorithms in the context of Yoruba Handwritten characters with diacritics. The uppercase characters of the Yoruba orthography with diacritics (except characters with macron) are only considered in this work. The training algorithms considered in this research are; Levenberg-Marquardt, Quasi-Newton BFGS, Resilient Propagation and Scaled Conjugate Gradient.

1.6 Methodology of Study

Several articles, paper presentations, journals and relevant past thesis and dissertations were consulted to have some ideas and directions for this research. The motive is to develop a model which evaluates the performance of some selected optimization algorithms on recognition of Yoruba characters. The model was simulated in MATLAB environment.

1.7 Thesis Organisation

The thesis is divided into five chapters. The first chapter is the introduction and background of the study. The second chapter is the literature review in which the reviewed previous work is divided into two models. It also discusses the overview of the character, character recognition, artificial Neural Network, back propagation and Neural Network optimization algorithms. Chapter three introduces the methodology employed in carrying out the research. It also describes the proposed model and the program flowchart of the system.

Fourth chapter discusses the various experiments and the results of each experiment are also discussed. Comparative was done through the use of graphs in this chapter. The fifth chapter which is the last chapter summarises the whole chapters, draws conclusion based on the findings and makes recommendations for future work and further improvement.

CHAPTER TWO

LITERATURE REVIEW

2.1 Introduction

This chapter presents the review of relevant literatures for the research. The chapter begins with character as the basic building block of any language. The description of Yoruba Orthography, character recognition, Neural Network training algorithms and related previous studies were discussed.

2.2 Character

A character is the basic building block of any language that is used to build different structure of the language (Iorundu and Esiefarinrhe, 2015). Characters are the alphabets and the structures are the words, strings and sentences (Lecun et al., 1990), (Kader and Deb, 2012).

2.3 Yoruba Orthography

The Yoruba orthography is a variant of the English alphabet system. This is shown below:

À A Á B D È E É Ẹ̀ Ẹ Ẹ́ F G GB H Ì I Í J K L M N Ò O Ó ̀Ọ Ọ Ọ́ P R Ş T Ù U Ú W Y.

The marks on top of some of the characters are called accent (tonal) marks. There are basically three (3) types of tonal marks: grave (`), which represents a low tone; macron (¬), which represent a mid tone; and acute (′) which represents a high tone.

Micron is the “understood default” and is more often than not omitted in both written and printed documents. It should be noted that grave mark slopes downwards from left to right, acute mark slopes upwards from left to right. In addition, the marks on few other Yoruba characters are called under-dots (or more loosely, tails). Acute marks, grave marks, microns and under-dots are all referred to as diacritics (Abdulrahman and Odetunji, 2011).

The fifteen uppercase characters of the Yoruba orthography that bear diacritical marks with exception of characters with macron are;

À Á È É Ẹ̀ Ẹ́ Ì Í Ò Ó Ọ̀ Ọ́ Ṣ Ù Ú

2.4 Character Recognition

Character recognition is a subset of pattern recognition which gives a specific symbolic identity to an offline printed or written image of a character (Sitamahalakshmi et al., 2010).

Character recognition can be divided into two: (1) online character recognition in which text is automatically converted as it is written on digitizer such as PC tablets, where a sensor picks up the pen velocity as characters are scripted. The signals obtained are transformed into a letter code which is usable to computer and text processing applications. (2) Offline character in which handwritten characters are scanned in form of paper document, processed and converted to binary or grayscale to make available to a recognition system.

2.5 Artificial Neural Network (ANN)

An ANN is a parallel computational platform that simulates the structure and function of biological nervous system. The brain, which is the central organ in the human nervous system, is made up of neurons. Neurons are polarized cells. They receive signals through the dendrites, which comprise of highly branched extensions and send information along un-branched extensions, called axons. The human brain contains approximately 1014 to 1015 interconnections of neurons. The way neurons process information is similar. Neuronal information is transmitted in the form of electrical signals called action potentials via the axons from other neuron cells. When the action potential arrives at the axon terminal, the neuron releases chemical neurotransmitter which effects the inter neuron communication at specialized connections called synapses. In biological processes, learning has to do with adjustments to the synaptic connections between these neurons (Adetiba, 2013). Figure 2.1 shows the anatomy of a biological neuron.

Figure 2.1: Anatomy of Biological Neuron (Source: Monge and Tomassini, 1998)

Artificial neuron was inspired principally from the structure and functions of the biological neuron. Figure 2.2 depicts the structure and components of an artificial neuron.

Figure 2.2: The structure of an Artificial Neuron (Source: Adetiba et al., 2013)

An artificial neuron as shown in Figure 2.2 has a set of n synapses associated with the inputs (b1,…,bn) and each input has an associated weight (wi). A signal at input i is multiplied by the weight wi, the weighted inputs are added together and a linear combination of the weighted inputs (w1b1 + …+ wnbn) is obtained. A bias (w0), which is not associated with any input is added to the linear combination and a weighted sum x is obtained as;

x =w0 + w1b1 + …+ wnbn. (2.1)

Afterward, a nonlinear activation or transfer function f is applied to the weighted sum in (2.1) and this produces the artificial neuron’s output y shown in (2.2);

y = f(x). (2.2)

The flexibility and ability of an artificial neuron to approximate functions to be learned depend on its activation or transfer function. Some examples of activation or transfer functions are linear, sigmoid and radial. The linear activation functions are mostly applied in the output layer and it has the form;

f(x) = x (2.3)

The sigmoid activation functions are S shaped and the ones that are mostly used are the logistic and the hyperbolic tangent (equations (2.4) and (2.5) respectively);

f(x)=1/(1+e-ax) , (2.4)

f(x)=(e^x-e^(-x))/(e^x+e^(-x) ) (2.5)

There are different types of radial activation function but the one that is usually adopted uses

Gaussian function;

f(x)=e^(-x^2/b^2 ) (2.6)

A neuron learns through an iterative process of adjustment of its synaptic weights and a neuron becomes more knowledgeable after each iteration of the learning process. Assuming O is a desired output of a neuron of an ANN for a certain input vector and Y is the actual output.

If O=Y, then the implication is that the neuron will not be able to learn anything. However, if O ≠ Y, then a neuron must learn so as to ensure that after weight adjustments (wi), its actual output (Y) will match the desired output (O). This generates an error signal (ε) which is represented with (2.7);

ε=O-Y (2.7)

The main aim of learning by the neuron is to adjust the weights and update the output for a new actual output (Ỹ) which coincides with the desired output, O. This approach to weight adjustment is called the error correction learning rule and it is represented as;

Ỹ = Y + δ = O. (2.8)

Given some weights and input vectors as;

W = (w0,w1,…,wn), (2.9)

X = (x1,…,xn) (3.0)

The weights (wi), will be adjusted with consideration for the errors (δ);

ŵ0 = w0 + αδ, (3.1)

ŵi = wi + αδxi ; i = 1,…,n. (3.2)

Where α is the learning rate and it is equal to 1 for the threshold neuron called Perceptron.

Perceptron is a computational model of neuron similar to the model shown in Figure 2.2 but with a threshold activation function in equation (3.3). It was proposed by Frank Rosenblatt; an American psychologist.

Y(x) = sign(x)={█(1,&x≥0@-1,&x<0)┤ (3.3)

The capability of a single artificial neuron which is the basic unit of an ANN system is very limited. For instance, the Perceptron cannot learn non-linearly separable function. To learn functions that cannot be learned by a single neuron, an interconnection of multiple neurons called Neural Network (NN) or Artificial Neural Network (ANN) must be employed.

Figure 2.3 shows the simplest ANN.

Figure 2.3: Single-layer perceptron

Apart from the basic processing units in ANN, there are patterns of connections between the neurons and the propagation of data called network topology. There are two main types of ANN topology; feed-forward and recurrent (feedback) network topologies.

In feed-forward networks, the data flow from input to output strictly in a forward direction and there is no feedback of connections. While in recurrent (feedback) networks, there are feedback connections. Apart from the single-layer perceptron for learning linearly separable pattern, another commonly used feed-forward network topology is Multi-Layer Perceptron (MLP).

2.5.1 Multilayer Perceptron

Multilayer Perceptron (MLP) is an ANN topology that caters for learning of non-linear functions and Figure 2.4 shows the diagrammatic illustration of its topology.

Figure 2.4: Multi-Layer Perceptron (MLP) Topology (Source: Adetiba, 2013)

The MLP networks are typically trained with the training algorithm called the Back Propagation

(BP) algorithm which is a supervised learning algorithm that maps the process inputs to the desired outputs by minimizing the errors between the desired outputs and the calculated outputs (Huang, 2009). BP is an application of the gradient method or other numerical optimization methods to an ANN with feed-forward architecture in order to minimize the error function. The algorithm is the most popular method for performing supervised learning (Werbos, 1994).

2.5.2 Back Propagation

Back propagation algorithm is one of the most widely used supervised learning algorithms as they are simple and effective. Back propagation network is made up of nodes arrange in three layers; the input layer, hidden layer and output layer. The input and output layers are mainly the buffers for the system, with actual computation of the network taken place in the hidden layer. The number of input nodes is the actual amount of data to be processed and the number of output nodes is the expected numbers of output results. There is no fixed number for hidden nodes, although there are some mathematical equations that claim that it can calculate the number of nodes needed. Basically trial and error is used to obtain the number of nodes required in the hidden layer.

Before processing any data, the weights of the nodes are random and back propagation is supposed to adjust these weights, in order for the network to learn. When a set of input data is fed to the network, each input node attribute is being fed to the hidden layer where it multiplies each node attribute value with the weight and adds them together as shown in Figure 2.5.

Figure 2.5: Back propagation Forward Calculation (Lim, 2009)

Y1=f1 (W(X1)1X1 + W(X2)1X2) (3.4)

Y=f3 (W13Y1 + W23Y2) (3.5)

With the output, it will be compared with the desired output value in the training data to generate a value called mean-squared error signal “d”. This signal is “back propagated” through the network, causing weight of nodes to be adjusted in order to achieve a low error signal as shown in Figure 2.6.

Figure 2.6: Back Propagation Weight Adjust (Lim, 2009)

d1=W13d (3.6)

d2=W23d (3.7)

〖W’〗_((x1)1) = W_((x1)1)+ηd_1 (df_(1(e)))/de X_1 (3.8)

〖W’〗_((x1)1) = W_((x2)1)+ηd_1 (df_(1(e)))/de X_2 (3.9)

〖W’〗_13 = W_23+ηd (df_(3(e)))/de Y_1 (4.0)

〖W’〗_23 = W_23+ηd (df_(3(e)))/de Y_2 (4.1)

The training data is repeatedly compared with the output, with the weight of the nodes adjusted till the overall error is below a predetermined tolerance.

The back-propagation algorithm consists of the following steps:

Step1: Initialize weights randomly.

Step2: Present an input vector pattern to the network.

Step3: Evaluate the outputs of the network by propagating signals forward.

Step4: For all output neurons calculate:

δj = (yj – dj),

where dj is the desired output of neuron j and yj is its current network output.

yj = g(Ʃiw_i x_i) = (1 +e^(-Ʃiw_i x_i ))-1 (assuming a log-sigmoid activation function).

Step 5: For all other neurons (from last hidden layer to first) compute:

δ_j= ∑_k▒w_ij ǵ(x)δ_k ,

where δ_k is the δ_j of the succeeding layer and ǵ(x) = yk(1-yk).

Step 6: Update the weights according to;

W_ij (t+1) = W_ij (t) – α ∂E/∂W(t)

Where;

α is a parameter called the learning rate.

W_ij (t+1) is the new weight and W_ij (t) is the current weight.

∂E/∂W(t) is the derivative of the error function with respect to the weight.

E is the error function which can be sum of square error or mean sum of square error and it is a function of δ_j.

Step7: Go to Step 2 for a certain number of iterations or until the error is less than a pre-specified value.

Figure 2.7: Backpropagation algorithm

In practice, α is chosen between 0 and 1 to ensure that the BP algorithm converges and avoid oscillation of the error function surface (Plagianakos, 1999). Many authors and researchers have speculated that ANN should be implemented several times with varying random initialization of the weights and the network with the best outcome should be selected. However, Fausett (1994) conceives that the optimal weight interval is (-1, 1) or (-0.5, 0.5) while Thimm and Fiesler 1997) affirms that (-0.77, 0.77) interval gives the most optimal result for several trials. There are different variants of BP algorithm which include; conjugate gradient, Levenberg

Marquardt (LM), gradient descent, quasi-Newton etc. After several experimentations with four selected BP training algorithms, the optimal algorithm that was adopted for this work is the Levenberg-Marquardt BP algorithm.

2.5.3 Artificial Neural Network Training Algorithms

2.5.3.1 Levenberg-Marquardt (LM)

Levenberg-Marquardt is an enhanced Newton’s method. It has been designed to minimizing the functions which are sum of square of other non-linear functions. This method is very well adapted to Neural Network training where the performance measure is Mean Square Error (MSE) (Hagan et al., 1996). This method generally deals with optimizing MSE like cost function (Basterrech et al., 2011). The LM performance is regarded to be very great when it is applied to Artificial Neural Networks as a second order algorithm. Benefitting from naturally estimating the learning rate in contrast to using random values in first order algorithms; enable LM to be one of the most efficient learning algorithms (Wilamowski and Yuh, 2010).

However, the drawback of LM is its memory requirement (Hagan et al., 1996), (Ampazis and Perantonis, 2000). Calculating Jacobian matrix of the samples in each iteration, demands for excessive memory to deal with large scale databases.

2.5.3.2 Quasi-Newton BFGS

BFGS method is a classical Quasi-Newton and also one of the most effective algorithms of the unconstrained optimization problems (Arka and Mriganka, 2012). The BFGS algorithm was developed independently by Broyder, Fletcher, Goldfarb, and Shanon (Nocedal and Wright, 1999). The basic principle in Quasi-Newton methods is that the direction of search is based on an nxn direction matrix S which serves the same purpose as the inverse Hessian in the Newton method. This matrix is generated from available data and is contrived to be an approximation of inverse Hessian (H-1). Furthermore, as the number of iterations increased, S becomes progressively a more accurate representation of H-1.

However, computing the hessian that many time for a given objective function is very much expensive in terms of memory and time.

2.5.3.3 Resilient Propagation (RP)

Multilayer Network sometimes applies sigmoid functions as transfer function in the hidden layers. These functions are often called “squashing” functions, since they reduce an infinite input range into a finite output range. Sigmoid functions are characterized by the fact that their slope must approach zero as the input gets large. This causes a problem when using steepest descent algorithms to train a multilayer network with sigmoid functions, since the gradient can have a very small magnitude and therefore, cause small changes in the weights and biases, even though the weights and biases are far from their optimal values (Ozgur and Erdal, 2005). The purpose of the Resilient Propagation (RP) training algorithm is to eliminate these harmful effects of the magnitudes of the partial derivatives. Only the sign of the derivative is used to determine the direction of the weight update, the magnitude of the derivative has no effect on the weight update. The size of the weight change is determined by a separate update value (Ozgur and Erdal, 2005).

2.5.3.4 Scaled Conjugate Gradient (SCG)

Scaled Conjugate Gradient is a second order conjugate gradient algorithm that helps minimize goal functions of several variables. This theoretical foundation was proved by (Moller, 1993) which remains first order techniques in first derivatives like standard back propagation and find the better way to a local minimum in second order techniques in second derivatives.

SCG uses a step size scaling mechanism to avoid a time consuming line search per learning iteration, which makes the algorithm faster than other second order algorithms. Based on (Moller, 1993), SCG method shows super-linear convergence on most problems.

2.6 Related Work

This research divides my literature review into two basic models: a character recognition model and comparative study models.

2.6.1 Literature Review of Character Recognition Models

Ebenezer et al. (2014) in their work proposed for recognition of Igbo vowel characters using Artificial Neural Network. A standard back propagation with adaptive learning rate and adaptive momentum were used and recognition rate of 90.2% was obtained after testing the neural network with 30% of the dataset. Nonetheless, their research was restricted to 9 vowel characters of the Igbo orthography.

Abdulrahman and Odetunji (2011) developed a technique for the classification of diacritically marked uppercase Yoruba letters in offline mode. The system involves six stages. They built Bayesian stage using 40 samples per Bayesian class. The system was tested in two folds. In the first fold, they tested it on eight independent samples of each of the seventeen classes of the diacritical letters. A recognition rate of 91.18 % was obtained. In the second fold, the system was tested on three non-independent samples of each of the six Bayesian classes and a recognition rate of 94.4% was recorded. The work presented an approach of Bayesian rule and decision tree. However, their work did not cover the entire Yoruba orthography which also includes lowercase letters. Also, the single diagraph (GB) of the Yoruba orthography was not considered.

Oyebade et al. (2015) in their work investigated the tolerance in neural network based recognition systems to some common pattern variances that occur in pattern recognition. The research also investigated the performance of deep networks on some common problems (variances) associated with pattern recognition systems such as translational invariance, rotational invariance, scale mismatch and noise. Handwritten Yoruba vowel characters were used to evaluate the performance of deep learning networks considered in their research. The research basically focused on comparing error rates on noise for the network architectures employed. It was observed that the pre-trained deep networks outperformed the untrained shallow networks of Back propagation Neural Network (BPNN). However, researches have shown that none of deep learning models work as classification algorithms per se. Instead, they are used for pre-training – learning transformations from low-level and hard-to-consume representation (like pixels) to a high-level one. Once deep (or may be not deep) network is pre-trained, input vectors are transformed to a better representation and resulting vectors are finally passed to real classifiers (such as SVM, ANN or Logistic regression).

On the contrary, Jorder and Aziz (1998) in their paper presented a neural based invariant character recognition system using double back propagation algorithm. Their system was tested with English numeric digits. The test involved rotated, scaled and translated versions of exemplar patterns. The system successfully recognized 97% of the tested patterns. Nonetheless, the system was not used to test English alphabetic characters, either printed or handwritten and other similar applications.

Iorundu and Esiefarienrhe (2015) developed an artificial Neural Network model for Tiv character recognition (ANNTCR). The system was developed using Java Programming language and implemented using Econg Framework. Both the training and testing were done using feed forward resilient propagation neural network. Further, the system was tested with characters of different font styles. The result showed that the average recognition rate of the system was 99.4% while the system’s rejection rate was below 1%.

Mahmood et al. (2012) in their work developed an Artificial Neural Network model for Yoruba Character Recognition (ANNYCR). They employed edge-end pixel extraction algorithm to extract numerical data for character analysis. They also used Discrete Cosine Transformation algorithm to transform the features of each character into vectors, which served as input for Artificial Neural Network training. The ANNYCR was implemented using Java Programming language based on Econg framework together with a well-known Artificial Neural Network model for English characters (ANNECR). The ANNYCR was trained using feedforward Resilient Supervised Back Propagation algorithm, while supervised BP was used for ANNECR. The result showed that the ANNYCR recognized all Yoruba alphabets including characters with dot, ligature and tonal sign, while ANNECR could not recognize Yoruba characters with dot, ligature and tonal signs. During the training, it was observed that BP algorithm was converging when used to train Yoruba characters whereas Feedforward Resilient propagation converged when used to train both Yoruba and English characters. The result showed that BP cannot be applied for Yoruba character recognition as it is being used for recognition of English characters.

2.6.2 Literature Review of Comparative Studies

Arka and Mriganka (2012) presented a hybrid optimized back propagation learning algorithm for successful learning of multilayer perceptron. Their learning algorithm utilizing an Artificial Neural Network with Quasi-Newton was proposed for design optimization of function approximation. They used their proposed network to approximate two particular functions namely; Beale function and Booth function. The performance of the proposed hybridization was measured based on the CPU time and mean square error. They presented two modifications to the classical approach of the Quasi-Newton method.

Also, Zulhadi et al (2010) in their work studied the results of Neural Network algorithms using numerical optimization techniques for multiface detection in static images. The training algorithms used in the study involved the conjugate gradient algorithms, the Quasi-Newton algorithms and Resilient Back propagation algorithm. From their experiment, it was found that scaled conjugate back propagation has the best performance compared with other algorithms in accuracy and processing time aspects. And as a result, scaled conjugate gradient BP was used as the training algorithm for the proposed system.

In the same vein, Bharna and Venugopalan (2014) presented a paper on comparative analysis of neural network training functions for hematoma classification of brain CT images using Gradient Descent BP, Gradient Descent Momentum (GDM), resilient BP, Conjugate Gradient BP algorithms and Quasi-Newton based algorithms. Their work compared the training algorithms on the basis of Mean Squared Error (MSE), recognition accuracy, rate of convergence and correctness of the classification. In the proposed work, no significant differences were found among the correct classification percentage for Resilient BP, Scaled Conjugate BP and Levenberg-Marquardt algorithms, all are in acceptable range. Meanwhile, the convergence speed of Levenberg-Marquardt and Scaled Conjugate algorithms are found to be higher than other training functions. Based on Epochs (iterations) and MSE parameters, Levenberg-Marquardt and Scaled Conjugate algorithms outperformed other training functions.

Bhavani et al. (2011) described a method for the classification of respiratory states based on four significant features using Artificial Neural Network (ANN). They analysed the performance of five back propagation training algorithms namely; Levenberg-Marquardt, One-Step Secant, Powell-Beale Restarts, Quasi-Newton BFGS, and Scaled Conjugate Gradient for classification of respiratory states. In their experiment, it was observed that Levenberg-Marquardt algorithm was correct in approximately 99% of the test cases.

Hesamian et al. (2015) in their proposed study performed some tests on some well-known training algorithms (Levenberg-Marquardt, Resilient Propagation and Scaled Conjugate Gradient) to evaluate their performances for scene illumination classification. The result of their study shown that Levenberg-Marquardt is the most accurate with recognition rate of 94.41% and Resilient Propagation as the fastest method with response time of 0.426 seconds.

CHAPTER THREE

RESEARCH METHODOLOGY

3.1 Introduction

The overview of the methodology is as shown in figure 3.1. The method employed involved five major phases.

3.2 Overview of the methodology

Figure 3.1: Overview of the Methodology

3.2.1 Image Acquisition

This is the first stage in the process of character recognition. Handwritten characters were collected from volunteers and indigenous writers. The acquired data was scanned as input image to the preprocessing stage. The scanned data were saved as jpeg file format and stored in a database. In this research, 20 samples were collected per 15 classes (that is per character).

Figure 3.2: Scanned Image

Begin

Read the scanned image

Convert the image to grayscale

Convert the image to binary

Remove Noise

Detect edges of the image

Perform morphological operations (that is, dilate and fill holes)

Normalise the image size

Save the pre-processed image

End

Figure 3.3 Pseudocode for image preprocessing

3.2.2 Preprocessing

The output of the image acquisition phase serves as input to the preprocessing stage. The scanned image is loaded into MATLAB development environment for processing. To create a usable input vector containing features of a character, colour of an image is irrelevant. Thus, acquired image is first converted to grayscale, removing all colours and hue, and retained the luminosity.

Binarisation is applied which convert the grayscale image to binary forms of 0s and 1s using otsu method.

Otsu algorithm: in this algorithm, existence of two classes, foreground and background is accepted by calculating the global threshold and choosing the threshold that minimizes the interclass variance of the thresholded black and white pixels (Akintola, A.G., 2016). Converting to binary is necessary to help in character segmentation.

In the noise reduction process, noise caused by scanner, paper quality and image conversion is detected and removed, then connected components are reduced. In this research, adaptive filtering method; wiener filter was implemented using MATLAB image processing toolbox as it helps to preserve edges, high frequency parts and does not blur the image like linear filter. Median filtering was also used to further enhance the image quality by removing any leftover noise. Edges of the image were also detected to help in the process of labeling for segmentation.

3.2.3 Segmentation

In this stage, an image of characters is divided into sub-images of individual characters (Pradeep J et al., 2011). In this research, the preprocessed input image is segmented into isolated characters by apportioning a number to each character using a labeling process. This labeling process provides information about number of characters in the image. For each character, minimum and maximum row, along with columns are used to determine the starting pixel of the character. These values are assigned as a row vector (Lim, 2009). Each individual character is uniformly resized into 70 X 50 pixels.

3.2.4 Feature Extraction

In this research, Diagonal based approach was adopted to extract the features of the characters.

The normalized character of size 70×50 pixels is sub-divided into 35 equal zones, each of size 10×10 pixels. The features are extracted from the pixels of each zone by moving along their diagonal parts. This process was repeated for all the zones leading to extraction of 35 features for each character. These extracted features were used to train a multilayer feed-forward neural network. Each zone has 19 diagonal lines and the foreground pixels present along each diagonal line is summed to get a single sub-feature. Thus, 19 sub-features are obtained from each zone. These 19 sub-feature values are averaged to form a single feature value and placed in the corresponding zone (figure 3.3). This procedure is sequentially repeated for all zones. There could be some zones whose diagonals are empty of foreground pixels. The feature values corresponding to these zones are zero. Moreover, 7 and 5 features are obtained by averaging the values placed in zones row-wise and column-wise respectively. As a result, every character is represented by 35 (Pradeep J et al., 2011).

Z1 Z2 Z3 Z4 Z5

Z6 Z7 Z8 Z9 Z10

Z11 Z12 Z13 Z14 Z15

Z16 Z17 Z18 Z19 Z20

Z21 Z22 Z23 Z24 Z25

Z26 Z27 Z28 Z29 Z30

Z31 Z32 Z33 Z34 Z35

Figure 3.4: Diagonal Based Feature Extraction

3.2.5 Classification

In this work, Artificial Neural Network (ANN) was used for classification of characters. The optimized back propagation algorithms were used in training the Multilayer Feed Forward Neural Network (MFFNN) in order to minimize the limitations identified with standard/conventional back propagation. The optimized back propagation algorithms considered are Levenberg-Marquardt, Scaled Conjugate Gradient, Resilient Propagation, Quasi-Newton BFGS and their performances were evaluated based on Mean Squared Error (MSE), Epochs, recognition accuracy, Speed of training.

Each training algorithm was tested on average of five trials on the dataset and results were evaluated by recording the classification accuracy, MSE, response time, R2 as a measure of goodness of fit in linear regression.

During the experiments, the network parameters were kept constant for each training algorithm, in order to ensure equal opportunity for evaluating the algorithms.

3.2.5.1 Design of the Artificial Neural Network (ANN) Classification Model

The design of ANNs is a complex task; a few training dataset will provide limited learning capabilities, while a complex one will induce generalization loss (i.e. overfitting). For the classifier in this research; a feed forward ANN was experimentally designed to perform pattern recognition which classifies input vectors into 15 classes of tonal characters of Yoruba orthography. The architecture of a typical 2-layer ANN is shown in figure 3.5.

Figure 3.5: Architecture of a typical 2-layer Feedforward ANN

The symbols in Figure 3.4 are defined as follows:

p = input vector of R x 1 dimension where R is the number of rows. For batch processing, p is a matrix.

W = is the weight matrix of dimension S x R where S is the number of neurons in the layer.

b = is the bias vector which is a weight with 1 as input.

n = is the weighted input into the transfer function.

a = is the hidden layer output vector.

y = is the output vector from the network.

The designed network in this research has 35 input neurons (corresponding to 7 x 5 matrices) and 15 output neurons because there are 15 target values which represent the 15 classes associated with each input vector. When an input vector is entered into the network, the corresponding neuron produces a 1 and the others produce 0s. The target vector for a particular character in this research is shown in table 3.1.

Table 3.1: Target vectors of character À

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

These vectors were combined into a target matrix for batch training of the ANN. The procedures for the design and configuration of the ANN in this work are reported in the subsequent subsections.

3.2.5.2 Tuning Artificial Neural Network (ANN) for maximum performance

Wolpert (1995) coined the “No Free Lunch Theorem” which states that; “there is no such thing as the best learning algorithm”. Also, many researchers realised that neural networks are not so easy to apply in practice due to many decisions which needed to be made which include what architecture, what activation functions and what learning rate (Ethem, 2010). In order to overcome this same problem in this research, several experiments were carried out to figure out the appropriate parameters and the best configurations that will ensure an optimal performance of the classification process. The parameters that were experimentally determined for the ANN are:

The appropriate backpropagation training algorithm.

The optimal number of epochs.

The type of transfer functions to use both in the hidden and output layers.

The number of hidden layer neurons.

The results and findings of various experiments performed are reported in chapter four.

3.2.5.3 Creation, Training and Testing of Feedforward ANN

This section discusses the implementation of the classification process using pseudocode for creating, training and testing the network. The pseudocode was illustrated diagrammatically by a flowchart depicting the sequence of the process. Figure 3.6 shows the pseudocode for FFANN.

### About this essay:

This essay was submitted to us by a student in order to help you with your studies.

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, *Character recognition*. Available from:<http://www.essaysauce.com/information-technology-essays/character-recognition/> [Accessed 18-02-18].