Essay details:

  • Subject area(s): Engineering
  • Price: Free download
  • Published on: 7th September 2019
  • File format: Text
  • Number of pages: 2

Text preview of this essay:

This page is a preview - download the full version of this essay above.

LDA Based Video retrieval System


In the growth of the internet and Multimedia, a massive set of audience of different age groups are interested in watching the videos. As the numbers of audience keeps exponentially growing, there arises a deep thirst for new invention for video retrieval mechanism that replaces the conventional text based retrieval system. Efficient Video Indexing Using Linear Discriminate introduces content based video retrieval system that effectively utilizes the visual features of video for classification and indexing, which improves the quality of the video retrieval. Detection of interesting points using Linear Discriminate Analysis (LDA) captures the special vector points of object key frame and index it which acts as the underlying essence of this content based video retrieval research.

Keywords: Principal Component Analysis, Linear Discriminate Analysis, Image analysis, Interesting point Computation

1. Introduction

In recent days, the growth of the visual content has increased to multifold in the online server. Due to its great entertainment value, videos attract the people of all age groups with internet access. This trend had led to the birth of huge number of websites to be christened for browsing and viewing videos. Most of the video oriented websites functions based on the text based mining. This text based video mining worked so well as long as the video collections were relatively small. But the recent exponential growth of the video collection in the internet has made the text based video mining approach to arrive an irrelevant output. This creates a great demand for content based video retrieval system.

2. Need for the research

Any research carried out should address the burning problem of the people living in this world. In this aspect, it is a mandate to measure the benefit and scope of the research. The concept of the content based video retrieval is not a new budding concept and it prevails for more than 25 years. Hence, a survey was conducted among the internet users to know the need and necessity of this research.

The survey was very much focused to indicate the usage of the internet by the modern people, why is internet being used, the role of videos in internet and quality of video search in the modern web. Nearly 32 people including students, IT professionals took part in this survey. The result of the survey report is presented below.

The usage  hours of internet is > 4hrs for 53% people, between 2 to 4 hours for 31% people and less than 2 hours for 15% people.

Figure 1: Trends of Internet Usage

The purpose of internet usage is accessing information, entertainment, banking and communication for 84% people. It has to be clearly stated that there exist 9% internet audience and 8% internet audience only for entertainment and communication respectively.

Figure 2: Purpose of Internet Usage – Survey results

80% of people accept that they use internet for watching the videos and 20% of people disclose that they does not watch videos through internet.

Figure 3: Interest towards Watching Videos Online – Survey results

The most highlighted part of this survey is 100% people say they use only YouTube Video Search Engine.

Only 21% of people rate YouTube as a best video search engine, while others 34% rate it as better,40% rate it as good and 1% rate it as not bad. This clearly indicates there can be significant improvement made on the video search.

Figure 4: Rating of YouTube – Survey Results

When people e were questioned about the accuracy of their search engines, Only 25% of the crowd says they achieve expected results “always” .68% of people says they achieve expected results “mostly” and 7% of people says “sometimes”.   

53% people answered yes and 47% people said no when they were asked if they can retrieve a video if they don’t know any captions or character involved in the video. This make the need for the content based video retrieval inevitable.

84% people expressed their readiness to accept a new video search engine that takes an image input.

3. Challenges in Content Based Video Retrieval

There has been much number of challenges involved in the implementation of content based video retrieval. These challenges have been stopping vigorously the transformation of video retrieval trend.

The first and foremost paramount challenge is bridging the semantic gap. The searching process of video initiated by human exist in the form of high level features, but only low level features of video can be computed and measured easily. Translating or converting the query posed by a human to the low level features parsed by the computer illustrates the problem in bridging the semantic gap.

The other breaking challenge in the content based video retrieval is the form of the input. Most of the video retrieval system adopts a video clip as an input. This leads the multiple features (audio and visual) of video to be considered for constructing the input query of the video retrieval system. The features in the video are continuously varying, so it is very tedious to represent it. The time taken to upload the input video clip may slow down the speed of the video retrieval system. The above described difficulties have stopped the implementation of video retrieval system in the reality though search by similar images has already been implemented.

It is advisable to place an image as an input for a video retrieval system.

4. Related works

There has been various trends involved in the content based video retrieval namely Colour based Retrieval, Texture Based Retrieval ,Shape Based Retrieval, Facial recognition based retrieval and Multimodal Retrieval.

4.1. Text Based Video Retrieval

Text-based video retrieval functions based on annotations/captions that act as a title for the videos (keywords, descriptions), or on collateral text that is found within a video (captions, subtitles, nearby text). It applies traditional text retrieval techniques to image annotations or descriptions. The text based video retrieval has i) Text may not exactly describe the content of video ii) The textual descriptions provided by one users may differ from other.

Jing Huang et. al[1] proposed new feature called color correlogram for image indexing and comparison.

Sim, D. G., H. K. Kim and R. H. Park [3] proposed a framework based on the discrete cosine transform domain. The method took the complete advantage of DCT coefficients and utilized the color and texture information for the retrieval of JPEG formatted images. This mechanism greatly decreased the retrieval complexity.

M. Flickner [5] proposed Color histogram mechanism. A color histogram is capable of representing only coarse characteristics of an image, a similar histogram can represent multiple images. This approach laid additional restriction on histogram matching. Two images with identical color histograms can have different split histograms; split histograms create a finer distinction than color histograms. This is particularly important for large image databases, in which many images can have similar color histograms.

P. Berman [6] found a multimodal approach that includes a diverse and expandable set of visual properties (color, texture, and location) in a retrieval framework. The framework works based on stairs algorithm that can operate in a regional query mode with only a moderate increase in computational overhead. This approach improvised many standard image retrieval algorithms by supporting queries based on subsections of images.

4.2 Texture Based Retrieval

J. Zhang[7] proposed the image retrieval based on the textural information of an image, such as orientation, directionality, and regularity. Here, utilize texture orientation to construct the rotated Gabor transform for extraction of the rotation-invariant texture feature. The rotation-invariant texture feature, directionality, and regularity are the main features used in the proposed approach for similarity assessment.

4.3 Multimodal feature based Retrieval

Haralick RM[8] utilized the multiple visual features including color feature (HSV color histogram), texture feature (co-occurrence matrix), shape feature (moment invariant based-on threshold optimization), spatial relationship feature (based-on the Markov chains ). The retrieval precision based on color feature is better than based on texture feature. An image retrieval method of combined color and texture features are more exact and efficient than other methods based on single feature mechanism.

P.S.Hiremath ed al,” [9] experimented approaches such as multispectral style, HSV color space, YCbCr color space and uses gray scale texture features for color texture analysis. The wavelet decomposed coefficient of image and its complements by using texture feature. Haar wavelet is more effective in texture feature compare with other wavelet.

 P. S. Hiremath and Jagadeesh Pujari [9] proposed an integrated matching scheme based on the shape. The shape information was estimated by Gradient Vector Flow fields .This method was efficiency in comparison with the wavelet method.

K.P. Ajitha Gladis and K.Ramar [10] proposed the image representation in terms of statistical properties, morphological features and fuzzy cluster features of the image to achieve more accurate results.

Son Lam Phung and A. Bouzerdoum [11] came up with a new mechanism “edge density”. This methodology concentrates on differentiating objects from non-objects using image edge characteristics. This fast object based detection performs well comparatively than colour based and texture based methods.

Linjun Yang et al [12] proposed a framework to improve the reliability of QBE-based image retrieval. Retrieval improvement is achieved by using a short video clip as a query than a single image. As a video clip holds object or scene appearances, the rich information contained in the video clip can be used to discover the proper query representation and to improve the relevance of the retrieved results. Video-based image retrieval (VBIR) performed more significantly more reliable than the retrieval using a single image as query.

5. System Architecture

The Proposed system LDA based video retrieval system is made of five steps i) Input key frame generation ii) Image pre-processing iii) Image to Matrix Conversion iv) PCA Computation v) LDA Projection. The proposed system considers a short video clip as an input. The video clips are converted into the series of input images. The Input image is pre-processed to remove the distortions. The object in the images are spotted using principal component analysis.LDA is applied on the output of PCA to derive the best linear discriminating transform. The LDA parameter is placed as indices in the search table.

Figure 5: Storing a video in LBA based CBVR

Figure 6: Retrieving a video from CBVR

5.1 Image Pre-processing

Once, an input image is fed into the video retrieval system, it has to be preprocessed. As a part of preprocessing the colour image is converted into the gray scale. The contrast of the image is improved by converting the pixel value of the original image based on the histogram of the desired brightness.

5.2 Image to Matrix Conversion

The input image is converted into the standard matrix by means of grid imposition.

5.3 Principal Component Analysis Projections

PCA is a five step process. Initially a dataset of image matrices X is prepared.

X = {x1, x2,…,xM }…………..(1)

The mean of image matrix Xm is computed.

Xm = 1/M ∑ Xn (sum of all input key frames) …………….(2)

The difference Delta (Xm) between the original image and mean image is calculated.

∆(Xm) = Xm – X(n= 0,1,2…)   --------- (3)

The co-variance matrix of Delta (Xm) is calculated. The Eigen Value and Vector of Co-variance matrix is obtained.

C = 1/M ∑ ∆(Xm) …………. (4)

5.4 Linear Discriminant Analysis of PCA

PCA was concentrating on the unique representation of an image whereas LDA explores to differentiate one image from others based on specific features and characteristics.LDA is capable of representing the feature change within an image(scatter within the classes) and difference in the features of the images(scatter among the classes).

LDA Projection SA = 1/N ∑ (Xm - ∆Xm) * (Xm - ∆Xm)T        ……………(5)

6. Experimental Setup

Efficient LDA Retrieval System was implemented using software Scilab and ORD Database Server. Scilab was utilized for image and mathematical processing needs.ORD is employed for the backend database.

Scilab is free and open source software for numerical computation providing a powerful computing environment for engineering and scientific applications. Scilab is released as open source under the CeCILL license (GPL compatible), and is available for download free of charge.

The experiment was performed with a dataset of videos with size of 2 kb. Video   clips are fragmented into a set of images using imread function of Scilab. The input images are converted into the matrices using dec2bin function. The mean matrices of the input key frames are computed using mean function. Using the mean matrices, the covariance matrices are obtained using cov function. The Eigen values are calculated from the covariance matrix using eig function, thus obtaining the principal component of the input image key frames. The PCA is further processed to obtain LDA

Video indexing System of ORD Database server has a single large database. The indexing table has four fields that includes S.No, LDA index, video_path. The field S.No indicates the number of the videos stored, the field is of type of auto-incrementing. The field LDA index holds the LDA parameter of objects in the video. The field video path holds the address location where the original video is placed.

7. Experimental Results

The proposed CBVR was equated against various performance metrics like precision, recall, memory consumed and the operation cost involved. The CBVR succeeded 85.5 % of precision in retrieving the relevant videos and suffers a downfall of 14.5% of recall.


Fig 7. Login Window                                   Fig 8. Action Window


Fig 9. Conversion video into images               Fig 10.Input Key frames

Fig 11. Converting image to binary matrix

Fig 12. Finding of LDA of PCA

The response time to retrieve the relevant videos from the video database of 700MB is approximately 90 seconds. The query performance can be highly improved when the system can be upgraded with higher version of MATLAB and Database software. The operation cost of fetching the principal component analysis and linear discriminate analysis of an input video is nearly 60 seconds. The memory consumption of CBVR is almost the size of the videos stored and nearly 0% of the memory is used for indexing whereas conventional histogram based indexing take up nearly 20MB in addition for indexing.

         {Relevant videos – retrieved videos}                  

 Precision =   _______________________________

Retrieved videos

         {Retrieved videos - relevant videos}

  Recall =      _____________________________

Relevant videos

S.NO Size of Video(MB) No. of Frames Extracted Operational Speed(Sec)

1 1.5 237 163

2 2.8 312 207

3 4.1 398 268

Table 1. Operational speed

Fig 13: Operational Speed Chart

S.NO No.of Objects indexed Precisions Recall

1 8 78.5 19.5

2 16 82.5 17.2

3 24 84 14.8

Table 2: Precision Vs Recall

Fig 14: Chart of Precision Vs Recall

9. Conclusion

LDA based video retrieval performed well in comparison with the existing text based mining, color based mining and texture based mining techniques. The system functions well as long as the input video query submitted to the system stays short. The response time of the system slows down as the size of the input video query grew. The ability to handle large video query or to modify the form of video query to the image query can be treated as further improvements of this research.

10. References

[1] Jing Huang, “Content based video retrieval system” , IEEE Computers,1995.

[2] M. Flickner , “Query by image and video content: The QBIC system” ,IEEE Computer, 1995.

[3] Sim, D. G., H. K. Kim and R. H. Park, Fast texture description and retrieval of DCT-based compressed images, Electronic Letters, 2001.

[4] S.K. Chang and A. Hsu, “Image information systems: Where do we go from here?”, IEEE Trans. on Knowledge and Data Engineering 4(5),1992

[5] M. Flickner et al, “Query by image and video content: The QBIC system” , IEEE Computer, September 1995.

[6] P. Berman, L. G. Shapiro, “Efficient content based retrieval: Experimental results.”, In IEEE Workshop on Content Based Access of Image and Video Libraries, 1999.

[7] J. Zhang and T. Tan, “Brief review of invariant texture analysis methods”, Pattern Recognit 35 , 2002.

[8] R. M. Haralick,“ statistical and structural approaches to texture”, Proceedings of IEEE 67:786-804, 1979.

[9]  P.S.Hiremath, Jagadeesh Pujari, \"Content Based Image Retrieval Based on Color, Texture and Shape Features Using Image and its Complement\" ,International Journal of Computer Science and Security,2007.

[10] K.P.Ajitha Gladis, K.Ramar, “Content-Based Image Retrieval using Patterns for Medical Application\", Graphics, Vision and Image Processing Journal, Volume 10, Issue 4,2010.

[11] Son Lam Phung, A.Bouzerdoum, \"Detecting People in Images: An Edge Density Approach\", IEEE International Conference on Acoustics, Speech and Signal Processing, 2017.

[12] Linjun Yang, Yang Cai,Alan Hanjalic,Xian-Sheng Hua,Shipeng Li, “Searching for images by video”, International Journal of Multimedia Information Retrieval, Volume 2, Issue 3,2013.

...(download the rest of the essay above)

About this essay:

This essay was submitted to us by a student in order to help you with your studies.

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, . Available from:< > [Accessed 18.10.19].