Home > Engineering essays > Enhancing Handwritten Kannada Numeral Recognition with JPEG Compression

Essay: Enhancing Handwritten Kannada Numeral Recognition with JPEG Compression

Essay details and download:

  • Subject area(s): Engineering essays
  • Reading time: 12 minutes
  • Price: Free download
  • Published: 7 June 2012*
  • Last Modified: 2 September 2024
  • File format: Text
  • Words: 3,366 (approx)
  • Number of pages: 14 (approx)

Text preview of this essay:

This page of the essay has 3,366 words.

INTRODUCTION

A major goal of pattern recognition is to reduce human participation capabilities in artificial systems. As a special aspect of visual perception, the ability to read machine-printed or handwritten text is one such remarkable ability of humans that is even today hardly matched by machine intelligence. Since the very first effort to achieve Optical Character Recognition (OCR), i.e., to automatically read machine-printed texts, the research field dealing with artificial reading systems has undergone significant changes in methodology and made considerable progress towards its ultimate goal.

Optical Character Recognition (OCR) is a process of automatic recognition of different characters from a document image. OCR systems are considered a branch of artificial intelligence and a branch of computer vision as well. Researchers classify the OCR problem into two domains. One deals with the image of the character by scanning, which is called offline recognition. The other has a different input method, where the writer writes directly to the system using, for example, a light pen as a tool of input. This is called online recognition. Figure 1 shows the block diagram of a typical OCR system. The online problem is usually easier than the offline problem since more information is available, like the movement of the pen, which may be used as a feature of the character [1]. These two domains (offline & online) can be further divided into two areas according to the character itself: the recognition of machine-printed data and the recognition of handwritten data. Machine-printed characters are uniform in size, position, and pitch for any given font. In contrast, handwritten characters are non-uniform; they can be written in many different styles and sizes by different writers and at different times, even by the same writer. The OCR system is based on three main stages: pre-processing, feature extraction, and discrimination (also called classifier or recognition engine).

Figure 1: Typical OCR block diagram [2]

Traditional OCR systems suffer from two main problems, one stemming from the feature extraction stage and the other from the classifier (recognition stage). The feature extraction stage is responsible for extracting features from the image and passing them as global or local information to the next stage in order to help with later decision-making and recognizing the character. Two challenges are faced: if the feature extractor extracts many features to offer enough information for the classifier, this means many computations as well as more complex algorithms are needed, leading to longer processing time. On the other hand, if few features are extracted to speed up the process, insufficient information may be passed to the classifier. The second main problem that the classifier is responsible for is that most classifiers are based on Artificial Neural Networks (ANNs). However, to improve the intelligence of these ANNs, huge iterations, complex computations, and learning algorithms are needed, which also leads to increased processing time. Therefore, if recognition accuracy is improved, the time consumed will increase and vice versa.

To tackle these problems, a new OCR construction is proposed in this paper, where neither a feature extractor nor an ANN is needed. The proposed construction relies on the image compression technique (JPEG). The compressor compresses the image by encoding only the main details and quantizing or truncating the remaining details (redundancy) to zero. Then it generates a unique vector (code) corresponding to the entire image. This vector can be effectively used to recognize the character since it carries the main details of the character’s image. The importance of the main details is that they are common among the same character, even when written by different writers.

Recognizing handwritten numerals is an important area of research because of its various application potentials. Applications include automating bank cheque processing, postal mail sorting, job application form sorting, automatic scoring of tests containing multiple-choice questions, and other areas where numeral recognition is necessary. A character recognition engine for any script is always a challenging problem mainly because of the enormous variability in handwriting styles. A recognition system must therefore be robust in performance to cope with the large variations arising due to different writing habits of different individuals.

Figure 2: UNIVERSITY LETTER

1.1 Scope

The goal of this paper is to provide a comprehensive overview of the application of JPEG algorithms in the research field of offline handwritten numeral recognition. Techniques for automatic handwritten numeral recognition can be distinguished as being either online or offline, depending on the particular processing strategy applied. Online recognition is performed as the number to be recognized is written. Therefore, handwriting has to be captured online, i.e., using some pressure-sensitive devices. They provide a rich sequence of sensor data, which is a big advantage of online approaches. In offline recognition, the recognition is performed after the text has been written. For this purpose, images of the handwriting are processed, which are captured using a scanner or a camera. This paper emphasizes approaches addressing the challenging task of offline handwriting Kannada numeral recognition. We concentrate on the most widely used JPEG algorithms.

Although handwritten Kannada numerals recognition shows parallels to classical OCR, i.e., the analysis of machine-printed text, the scope of this paper is limited to handwritten Kannada numeral recognition.

1.2 History of JPEG Algorithm

The JPEG committee was formed in 1986 by the CCITT and ISO Standards organizations to set worldwide standards for image compression. This work was technically complete by early 1991 and later approved as an International Standards Organization (ISO). Originally, JPEG targeted full-color still-frame applications, achieving a 15:1 average compression ratio.

The JPEG baseline system decomposes the input image into (8×8) pixel source blocks. Then, every block is divided into smaller parts based on the differences in color, and a DCT transformation is applied to these parts. The DCT is performed on (8×8) pixel blocks to transfer the blocks into the frequency domain, and the coefficients are then quantized and entropy coded for compression. Based on an (8×8) block, the theoretical limit for the maximum achievable compression ratio would be 64:1, but in reality, usable compression ratios are much less than that. Thus, in the case of a block that consists of only one color, the value after the DCT transformation is a single value.

1.3 Structure

The remainder of this article is organized as follows.

2. RELATED WORK

In this paper, a Devanagari numeral recognition algorithm is proposed based on the JPEG image compression algorithm. The aim of the handwritten numeral recognition (HNR) system is to classify an input numeral as one of K classes. Over the years, a considerable amount of work has been carried out in the area of HNR. Various methods have been proposed in the literature for the classification of handwritten numerals. These include Hough transformations, histogram methods, principal component analysis, support vector machines, nearest neighbor techniques, neural computing, and fuzzy-based approaches [3]-[4]. A study on different pattern recognition methods is given in [5]-[6]. In comparison with HNR systems of various non-Indian scripts (e.g., Roman, Arabic, and Chinese), we find that the recognition of handwritten numerals for Indian scripts is still a challenging task and there is a spurt of work to be done in this area. Few works related to the recognition of handwritten numerals of Indic scripts can be found in the literature [7]-[10]. A brief review of work done in the recognition of handwritten numerals written in Devanagari script is given below:

Many schemes for digit classification have been reported in the literature. They mostly differ in feature extraction schemes and classification strategies (Govindan & Shivaprasad, 1990; Trier et al., 1996) [11]. Features used for recognition tasks include topological features, mathematical moments, etc. Classification schemes applied include nearest neighbor schemes and feed-forward networks. In order to make their systems robust against variations in numeral shapes, researchers have also used deformable models, multiple algorithms, and learning. A survey of the techniques is provided by Amin (1997) [12] and Plamondon & Srihari (2000) [13]. Lam & Suen (1986) [14] used a fast structural classifier and a relaxation-based scheme which uses deformation for matching.

A knowledge-based system using multiple experts has been used by Mai & Suen (1990) [15]. Kimura & Sridhar (1991) [16] developed a statistical classification technique that utilized profiles and histograms of the direction vectors derived from the contours. Chen & Lieh (1990) [17] proposed a two-layer random graph-based scheme which used components and strokes as primitives. Jain & Zongkar (1997) [18] have proposed a recognition scheme using deformable templates. LeCun et al. (1989) [19] suggested a novel backpropagation-based neural network architecture for handwritten zip code recognition. Knerr et al. (1992) [20] suggested the use of neural network classifiers with single-layer training for the recognition of handwritten numerals. Wang & Jean (1993) [21] suggested the use of neural networks for resolving confusion between similar-looking characters. Among studies on Indian scripts, notable work has been done on the recognition of printed Devanagari characters by Sinha and others (Sinha & Mahabala, 1979[22]; Bansal & Sinha, 2001) [23]. They also suggested contextual post-processing for Devanagari character recognition and text understanding. For handwritten Bengali character recognition, Dutta & Chaudhury (1993) [24] presented a curvature feature-based approach. Chaudhuri & Pal (1998) [25] presented a complete Bangla OCR system.

3. DATA SET CHARACTERISTICS

Devanagari script, originally developed to write Sanskrit, has descended from the Brahmi script sometime around the 11th century AD. It is adapted to write many Indic languages like Marathi, Mundari, Nepali, Konkani, Hindi, and Sanskrit itself. Marathi is an Indo-Aryan language spoken by about 71 million people mainly in the Indian state of Maharashtra and neighboring states. Since 1950, Marathi has been written with the Devanagari alphabet. Figure 2 below presents a listing of the symbols used in Marathi for the numbers from zero to nine.

Figure 2: Numerals 0 to 9 in Kannada script

The dataset of Marathi handwritten numerals 0 to 9 is created by collecting handwritten documents from writers. Data collection is done on a sheet specially designed for data collection. Writers from different professions and age groups were chosen and were asked to write the numerals. A sample sheet of handwritten numerals is shown in Figure 3.

Figure 3: Sample sheet of handwritten numerals

The collected data sheets were scanned using a flatbed scanner at a resolution of 300 dpi and stored as color images. The raw input of the digitizer typically contains noise due to erratic hand movements and inaccuracies in digitization of the actual input. To bring uniformity among the numerals, the cropped numeral image is size-normalized to fit into a size of 60×60 pixels. A total of 400 binary images representing Marathi handwritten numerals are obtained from 20 different subjects.

4. JPEG COMPRESSION TECHNIQUE

JPEG may be adjusted to produce very small compressed images that are of relatively poor quality in appearance but still suitable for many applications. Conversely, JPEG is capable of producing very high-quality compressed images that are still far smaller than the original uncompressed data. JPEG is also different in that it is primarily a lossy method of compression. Most popular image format compression schemes such as RLE, LZW, or the CCITT standards are lossless compression methods. That is, they do not discard any data during the encoding process. An image compressed using a lossless method is guaranteed to be identical to the original image when uncompressed.

Figure 4(a): JPEG Encoder

Lossy schemes, on the other hand, discard useless data during encoding. This is, in fact, how lossy schemes manage to obtain superior compression ratios over most lossless schemes. JPEG was designed specifically to discard information that the human eye cannot easily see. Slight changes in color are not perceived well by the human eye while slight changes in intensity (light and dark) are. Therefore, JPEG’s lossy encoding tends to be more frugal with the grayscale part of an image and more frivolous with the color.

Figure 4(b): JPEG Decoder

In the JPEG baseline coding system, which is based on the discrete cosine transform (DCT) and is adequate for most compression applications, the input and output images are limited to 8 bits, while the quantized DCT coefficient values are restricted to 11 bits. The human vision system has some specific limitations, which JPEG takes advantage of, to achieve high rates of compression.

As can be seen in the simplified block diagram of Figure 4, the compression itself is performed in four sequential steps: 8×8 sub-image extraction, DCT computation, quantization, and variable-length code assignment, i.e., by using a symbol encoder.

The JPEG compression scheme is divided into the following stages:

  1. Transform the image into an optimal color space.
  2. Downsample chrominance components by averaging groups of pixels together.
  3. Apply a Discrete Cosine Transform (DCT) to blocks of pixels, thus removing redundant image data.
  4. Quantize each block of DCT coefficients using weighting functions optimized for the human eye.
  5. Encode the resulting coefficients (image data) using a Huffman variable word-length algorithm to remove redundancies in the coefficients.

Since we are not concerned in this work with the reconstruction part, only the compression part is used (dashed box) and the vector will be tapped immediately after the quantization stage.

5. PROPOSED ALGORITHM

Figure 5 illustrates the sequence of the proposed algorithm’s steps based on reference [3]. After the character’s image is scanned into the system, the JPEG approximation will produce a vector. This vector is assumed to uniquely represent the input image since it carries the important details of that image. Figure 6 shows a sample for Devanagari numeral 0. Then the Euclidean distance between this vector and each vector in the codebook will be measured. Finally, the minimum distance points to the corresponding character, and then the character is recognized. To obtain higher recognition accuracy, additional data on the length of the vector produced is also used in the recognition process.

5.1 System Components

The two main components are the code of the compression stage: 1. JPEG compressor and 2. The codebook, as shown in Figure 6. The JPEG compressor produces the vector, which is assumed to uniquely represent the input image since it carries the important details of that image. The codebook is obtained by taking the average of each group of Devanagari numerals. The codebook design procedure is explained in the following section.

Figure 5: Flowchart of the proposed algorithm

Figure 6: Graph of a sample JPEG approximation vector for Kannada number 0

5.2 Codebook Building

The codebook can be built as follows:

  1. Get 400 vectors for the entire available database (our database contains 400 written numerals).
  2. Group the 400 vectors according to their represented numerals. For instance, the group of number 0 has (in our database) 40 different 0’s that were written by 40 different writers, so it will have 40 vectors.
  3. Average each group, resulting in a unique vector for each group. These are the codes located in the codebook.

5.3 Classifier

After obtaining the codebook, the last step is numeral recognition, which is implemented using a Euclidean distance classifier. The Euclidean distance classifier is used to examine the accuracy of the designed system. The Euclidean distance (d) between two vectors X and Y can be defined as:

Expression

6. RESULTS AND DISCUSSION

JPEG compression property yields a high compression ratio, resulting in minimum image size. Every compressed image has a unique vector, which helps to identify each numeral. By using this unique vector, the proposed system has recognized the input numeral after measuring the Euclidean distance between the vector and the vectors in the codebook; then the shortest distance pointed to the corresponding numeral. In addition to the advantage of speed using the codebook, it can be universal in terms of the character’s nature (language, writing mode) as well as the character’s image size. We used 60×60 8-pixel color images as input images.

Table 1: The recognition accuracy

The codebook is obtained with the help of a set of available databases. The proposed algorithm is tested on input numerals, and the accuracy percentage recognitions for each character obtained. The individual and average recognition accuracy of numerals is shown in Table 1. The system was able to recognize the characters in a short time compared to any existing system using ANN because it saves time taken by the feature extractor as well as it uses the codebook. The codebook is obtained with the help of a set of available databases. The proposed algorithm is tested on input numerals, and the accuracy percentage recognitions for each character obtained. The individual and average recognition accuracy of numerals is shown in Table 1. The system was able to recognize the characters in a short time compared to any existing system using ANN because it saves time taken by the feature extractor as well as it uses the codebook.

Figure 7: Confusing handwritten numerals

7. CONCLUSION

A fast and robust method is proposed in this paper for achieving better recognition rates for handwritten Devanagari numerals which is not based on ANN to avoid the time-consuming problems. It is based on the JPEG image compression algorithm, which generates a unique vector that helps to identify each numeral. The result was considerably high in terms of the recognition rate. Our future work aims to improve the classifier to achieve even better recognition. The proposed method can be extended to the recognition of numerals of other Indic scripts.

REFERENCES

  1. Liana, M. & Venu, G. (2006). “Offline Arabic Handwriting Recognition: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, No. 5, pp. 712-724.
  2. Abdurazzag, A. A., & Rehiel, S. M. A. (2007). “Off-line Omni-style Handwriting Arabic Character Recognition System Based on Wavelet Compression,” Vol. 3 No. 4, pp. 123-135.
  3. Abdurazzag, A. A., & Rehiel, S. A. (2008). “JPEG for Arabic Handwritten Character Recognition: Add a Dimension of Application,” Advances in Robotics, Automation and Control, ISBN 78-953-7619-16-9, pp. 472.
  4. Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York.
  5. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification, second ed. Wiley-Interscience, New York.
  6. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. Springer Series in Statistics, Springer, New York.
  7. Pal, U., & Chaudhuri, B. B. (2000). “Automatic Recognition of Unconstrained Off-line Bangla Hand-written Numerals,” Proc. Advances in Multimodal Interfaces, Springer Verlag Lecture Notes on Computer Science (LNCS-1948), pp. 371-378.
  8. Tripathy, N., Panda, M., & Pal, U. (2004). “A System for Oriya Handwritten Numeral Recognition,” SPIE Proceedings, Vol. 5296, pp. 174-181.
  9. Wen, Y., Lu, Y., & Shi, P. (2007). “Handwritten Bangla numeral recognition system and its application to postal automation,” Pattern Recognition, Volume 40, pp. 99-107.
  10. Rajput, G. G., & Hangarge, M. (2007). “Recognition of isolated and written Kannada numerals based on image fusion method,” PreMI07, LNCS 4815, pp. 153-160.
  11. Govindan, V. K., & Shivaprasad, A. P. (1990). “Character recognition: a review,” Pattern Recognition, Volume 23, Issue 7, pp. 671-683.
  12. Amin, A. (1997). “Off-line Arabic character recognition: Survey,” Proc. 4th Int. Conf. on Document Analysis and Recognition, Munich (IEEE Press).
  13. Plamondon, R., & Srihari, S. N. (2000). “On-line and off-line handwriting recognition: a comprehensive survey,” IEEE Trans. Pattern Anal. Machine Intel., PAMI-22, pp. 63-84.
  14. Lam, L., & Suen, C. Y. (1986). “Structural classification and relaxation matching of totally unconstrained handwritten ZIP codes,” Pattern Recognition, pp. 15-19.
  15. Mai, T., & Suen, C. Y. (1990). “A generalized knowledge-based system for recognition of unconstrained hand-written numerals,” IEEE Trans. Syst., Man Cybern., SMC-20, pp. 835-848.
  16. Kimura, F., & Shridhar, M. (1991). “Handwritten numerical recognition based on multiple algorithms,” Pattern Recognition, Volume 24, pp. 969-983.
  17. Chen, L.-H., & Lieh, J. R. (1990). “Handwritten character recognition using a two-layer random graph model by relaxation matching,” Pattern Recognition, 23, pp. 1189-1205.
  18. Jain, A. K., & Zongkar, D. (1997). “Representation and recognition of handwritten digits using deformable templates,” IEEE Trans. Pattern Anal. Machine Intel., PAMI-19, pp. 1386-1391.
  19. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. B., Hubbard, W., & Jackel, L. D. (1989). “Backpropagation applied to handwritten zip code recognition,” Neural Comput., 1, pp. 541-551.
  20. Knerr, S., Personnaz, L., & Dreyfus, G. (1992). “Handwritten digit recognition by neural networks with single layer training,” IEEE Trans. Neural Networks, 3, pp. 303-314.
  21. Wang, J., & Jean, J. (1993). “Resolving multi-font character confusion with neural networks,” Pattern Recogn., 26, pp. 175-187.
  22. Sinha, R. M. K., & Mahabala, H. (1979). “Machine recognition of Devanagari script,” IEEE Trans. Syst., Man Cybern., SMC-9, pp. 435-449.
  23. Bansal, V., & Sinha, R. M. K. (2001). “A complete OCR for printed Hindi text in Devanagari script,” Proc. 6th Int. Conf. on Document Analysis and Recognition, Washington (IEEE Press).
  24. Dutta, A. K., & Chaudhury, S. (1993). “Bengali alpha-numeric character recognition using curvature features,” Pattern Recogn., 26, pp. 1757-1770.
  25. Chaudhuri, B. B., & Pal, U. (1998). “A complete printed Bangla OCR system,” Pattern Recogn., 31, pp. 531-549.
  26. Rajput, G. G., & Mali, S. M. (2010). “Marathi Handwritten Numeral Recognition using Fourier Descriptors and Normalized Chain Code,” IJCA Special Issue on Recent Trends in Image Processing and Pattern Recognition, RTIPPR, pp. 141-147.
  27. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). “Gradient-based learning applied to document recognition,” Proc. IEEE, 86(11), pp. 2278-2324.

About this essay:

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, Enhancing Handwritten Kannada Numeral Recognition with JPEG Compression. Available from:<https://www.essaysauce.com/engineering-essays/optical-character-recognition/> [Accessed 21-04-26].

These Engineering essays have been submitted to us by students in order to help you with your studies.

* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.