Essay details:

  • Subject area(s): Engineering
  • Price: Free download
  • Published on: 7th September 2019
  • File format: Text
  • Number of pages: 2

Text preview of this essay:

This page is a preview - download the full version of this essay above.

A Study on Organizing Multimedia Big Data


Research Scholar, Department of CSE

Saveetha University

Chennai, India

[email protected]


Professor, Department of CSE

Saveetha Engineering College

Chennai, India

[email protected]

Abstract— Big Data is a term used to describe data that exceeds the processing capacity of conventional database systems. In the growing trend of applications, it is very difficult to handle huge multimedia resources. The increase of personal digital cameras and online photo/video sharing community has show the way to a sudden increase of multimedia resources. Many new multimedia datasets are organized in a structural way but still searching a particular video is a tedious job. Multimedia data has a close relation with social network as it connects more people with their own photos and videos. In this paper, the semantics of the multimedia data and the various ways to organize multimedia data particularly in video surveillance system was studied. It also reviews and summarizes the different methods used in organizing multimedia resources in recent works and the applications.

Keywords— Big data, Multimedia Resources, Semantic, Video Surveillance.


Big data is a promising standard which is applied to datasets whose size is beyond the ability of commonly used software tools. Understanding the semantics of multimedia has been an important component in many multimedia based applications. In our current web there are various limitations, which includes: 1] If the search keyword is polysemous, it returns variety of results which are not likely. 2] Search engine fails to discriminate two categories of images. 3] Music retrieval is just as problematic as image search. 4] The current web does not understand natural language. 4] Associating photos and videos with keywords is much more difficult task than simply looking for keywords in the texts of documents.

Considering these limitations, it is necessary to develop a method to organize the multimedia resources. Manual annotation and tagging in multimedia resources has been considered as a reliable source of multimedia semantics. But, manual annotation is time consuming and expensive when dealing with huge scale of multimedia data. Advances in semantic web have made ontology another useful source of describing the semantics of multimedia resources. The semantic web provides a common framework that allows data to be shared and reused across applications, enterprise and community boundaries. However, the semantic gap between semantics and video visual appearance is still a challenge towards automated ontology-driven video annotation.

Large amount of videos with no metadata have emerged. Understanding unprocessed multimedia automatically based on their visual appearance becomes an important yet demanding problem. The quick increase in number of multimedia resources has brought an urgent need to develop intelligent methods to represent and annotate them. Usual applications which represent and annotate video events include intrusion detection system, video surveillance, video resources browsing and indexing system, criminal investigation system and sport events detection.

With the explosion of community contributed multimedia content available online, many social media storage allow users to upload media data and annotate content with descriptive keywords which are called social tags. The purpose of tagging is to make images and videos better accessible to public. Thus the goal is to search the related video resource from huge number of multimedia resources.


A. Semantic Link Network Model

The tags and surrounding texts of multimedia resources are used to measure semantic association in Semantic Link Network model[1]. The hierarchical semantic of multimedia resources are defined by their annotated tags and surrounding texts. A real dataset including 100 thousand images with social tags from Flickr is used in the experiment. Clustering and retrieval are the two evaluation methods used to measure the semantic relations between images accurately and robustly. The relatedness measures between concepts are extended to the level of multimedia.

Semantic Link network can be used for web intelligence activities, web knowledge and publishing, etc. When a user browses multimedia, other resources with semantic links to it can be recommended to the user. This method can measure the semantic relatedness of two images robust and correctly. The tags of the search results do not contain the search query, which is different from the traditional co-occurrence based search mechanism. Faceted exploration of search results is widely used in search interfaces for structured databases. The evaluation methods used to measure the semantic relations between images work efficient only for images which is a vital issue.

B. Association Link Network

Building Association Link Network[7] organizes multimedia resources with social tags. It establishes associated relations among various web resources. A discovery algorithm of associated resource is first developed to build original Association Link network(ALN) for organizing loose web resources. Then, three schemas for constructing kernel ALN and connection-rich ALN(C-ALN) are developed gradually to optimize the organizing of web resources. C-ALN has good performance to support web intelligence activities. An evaluation method is used to verify the correctness of C-ALN for semantic link on documents.  

The discovery algorithm of the associated resources has been developed based on the association rules between keywords. It extends the associated relation from keyword-level to resource-level to build the original ALN. Methods of the complex network are used to analyze the characteristics of the three states of ALN, the connection-rich ALN is developed to organize the associated resources more appropriately for users web intelligent activities. The main drawback of this method is, it supports only the web intelligent activities like Discovery and Learning, Browsing and Publishing.

C. Semantic Web

The Semantic Web[8] focuses on defining domain specific ontologies and reasoning technologies. The semantic web will bring structure to the meaningful content of web pages. Semantic web is not a separate web but an extension of the current web, in which information is given distinct meaning, better enabling computers and people to work in cooperation. Semantic web provides a common framework that allows data to be shared and reused across application, endeavor, and community boundaries.

Two important technologies for developing the semantic web are eXtensible Markup Language(XML) and Resource Description Framework(RDF). A program that wants to compare or combine information across the two databases has to know that these two terms are being used to mean the same thing.

A solution to this problem is provided by the third basic component of the semantic web, which are collections of information called ontologies. Ontology is a document or file that formally defines the relations among terms. The real power of Semantic Web will be realized when people create many programs that collect web content from miscellaneous sources, process the information and exchange the results with other programs.

Data are only meaningful in certain domains and are not connected to each other from the World Wide Web point of view. It certainly limits the contribution of Semantic web for sharing and retrieving contents within a distributed environment.

D. Bridging the semantic gap between image contents and tags

Tags are used to describe the image contents on the web. It is challenging to bridge the semantic gap between image contents and tags. This method uses a Unified framework which stems from a two-level data fusions between the image contents and tags. A unified graph is developed to fuse the visual feature based image similarity graph with the image tag bipartite graph. A random walk model is developed which utilizes a fusion parameter to balance the influences between the image contents and tags. This methods can be directly applied to applications such as image annotation content based image retrieval and text based image retrieval.

Simply using tags in image retrieval task is not a reliable and reasonable solution, the visual information of images should also be taken into consideration to improve the image search engines. Visual information gives the most direct correlations between images. This framework bridges the semantic gap between visual contents and textual tags in a simple but efficient way using Global feature extraction method, hybrid graph method and random walk method.

This framework only utilizes the image contents and the image tags information. There are lots of metadata on Flickr websites, such as social network information among users and the image notes information, which can also be employed to improve the retrieval performance. To achieve this, we can try backward random walk model instead of forward model.

E. Efficient and low compexity sureilance video compression model

Video surveillance has been widely used in recent years to enhance public safety and privacy safeguard. A video surveillance system that deals with content analysis and activity monitoring needs efficient transmission and storage of the surveillance video data. Video processing techniques can be used to achieve this goal by reducing the size of the video with no or small quality loss.  Efficient and low complexity surveillance video compression model[3] is used in banks, automated teller machine, streets, supermarkets, and parking places to prevent and track criminal activities. The video captured are sent to a closed circuit television server room, where a security staff will monitor several input feeds. This procedure is subject to human errors.

F. Using linked data to annotate  video resources

Video resources in distance learning[4] are crucial to explore, share, reuse and link for better e-learning experience. Most of the video resources are currently annotated in an isolated way which means that they lack in semantic connections. Providing annotation to video resources is highly demanded. Videos are traditionally searched by syntactic matching mechanisms. With more videos being annotated or tagged in the linked Data manner, users have begun to search videos in a more semantic web oriented fashion. The two major approaches used are the semantic indexing process and the natural language analysis process. Semantic event detection[5] in broadcast sports video is a novel approach for semantic event detection in sports video which combines the analysis and alignment of webcast text and broadcast video. It is an unsupervised approach based on Probabilistic Latent Semantic Analysis (pLSA) which automatically cluster text event and extract event keywords from webcast text in both professional and free styles. The disadvantage in this approach is it processes video files of small size.

G. Automatic Semantic Content Extraction

Automatic Semantic Content Extraction[9] in videos using a fuzzy ontology and rule based model uses spatial and temporal relations in event and concept definitions. The Meta ontology definition provides a wide domain applicable rule construction standard that allows the user to construct ontology for a given domain. This model needs to be improved by concentrating in the viewing angle of camera and the motions in the depth dimension for spatial relation extraction.


Organizations manage their own video resources separately, because the resources are produced by different partners under heterogeneous licenses and constraints at different times. With the rapid growth of the multimedia web, a large number of video resources are available on web. So, it is crucial to gain the capability to efficiently search for all related distributed resources together to allow them to be used to enhance the searching activities. Based on the study, this paper has identified the following primary challenges.

• Video resources should be described accurately. It is difficult to use only one general description to accurately tell the whole story of a video because one section of the video stream may have plenty of information but some of them might not related to the main points of the video when it was created. So, the normal title based description process is not good enough for annotating videos precisely. A more accurate portrayal mechanism, based on the timeline of the video stream is required.

• The description of video resources should be accurate and machine understandable to support related search functionality. Although a unified and controlled terminology can provide accurate and machine understandable vocabularies, it is unattainable to build such a unified terminology to satisfy different description requirements for different domains in practice.

• Linking video resources to useful knowledge data from the web. More knowledge and scientific data is published on the web by different research and educational organizations.

• Identifying crimes in video surveillance. The video captured are sent to a closed circuit television server room, where a security staff will monitor several input feeds. This procedure is subject to human errors.


Video surveillance is an important tool to enhance public safety and privacy protection. It is deployed in places of high security such as airports, trains, stations, city centers and commercial locations such as banks, ATMs, supermarkets to prevent and track criminal activities. The video captured are sent to a closed circuit television server room, where a security staff will monitor several input feeds. This procedure is subject to human errors. Video surveillance systems are also often used as an after-attack monitoring tool to discover suspects. These applications require the storage of video data over a period of time for automatic analysis and future use. The storage of raw video data captured directly from the cameras can be very expensive. Nearly 1 tera bytes of storage is needed to store the video input for a day. Managing this huge amount of information is a tedious job.

New generation of video surveillance system need to analyze the incoming data and identify suspicious activities by activity monitoring and event analysis more intelligently and automatically. This requires fast object detection and identification of unusual activities and events.

High level features from video content can be modelled and extracted using the automatic content extraction process in which object extraction is carried out from frames. Specific matching characteristics are defined for the event and object so that automatic labelling of these object and event can be done. While the user search with a specific search key, if the input video matches with those defined characteristics then the event is identified.  It determine whether the specific event is occurred or not from the extracted semantic features. Classification algorithm defines the characteristics of the specific event and if the specified characteristic is identified from extracted frame, then the specific event is identified. For example identifying whether a crime happened or not; whether a player makes foul or not; whether a player is out or not. By processing the extracted semantic content, proposed model assists in deriving a decision from the occurred event and assist in video interpretation without domain knowledge.


The rapid increase in number of video contents in internet has brought a need to develop methods to organize the video contents. In this paper, the various methods adopted to organize the multimedia video resources were studied. The challenges behind the same were analyzed. Semantic based video searching can be proposed for organizing multimedia video contents. This is used to organize the associated resources in web for effectively supporting the web intelligent activities such as browsing, knowledge discovery and publishing. The tags and surrounding text of video resources are used to represent the semantic content. The video contents with high play time are used to evaluate the proposed method.


[1] Chuanping Hu, Zheng Xu, Yunhuai Liu, Lin Mei, Lan Chen, and Xiangfeng Luo, “Semantic Link Network based Model for Organizing Multimedia Big Data” , IEEE Transactions on Emerging Topics in Computing, 2013

[2] L. Wu and Y. Wang. The process of criminal investigation based on grey hazy set. 2010 IEEE International Conference on System Man and Cybernetics, pp.26-28, 2010.

[3] L. Liu, Z. Li, and E. Delp. Efficient and low-complexity surveil-lance video compression using backward-channel aware wyner-ziv video coding. IEEE Transactions on Circuits and Sys-tems for Video Technology, 19(4):452-465, 2009.

[4] H. Yu, C. Pedrinaci, S. Dietze, and J. Domingue. Using linked data to annotate and search educational video resources for supporting distance learning. IEEE Transactions on Learning Technologies, 5(2):130-142, 2012.

[5] C. Xu, Y. Zhang, G. Zhu, Y. Rui, H. Lu, and Q. Huang. Using webcast text for semantic event detection in broadcast sports video. IEEE Transactions on Multimedia, 10(7):1342-1355, 2008

[6] H. Zhuge. Communities and Emerging Semantics in Semantic Link Network: Discovery and Learning. IEEE Transactions on Knowledge and Data Engineering, 21(6):785-799, 2009.

[7] X. Luo, Z. Xu, J. Yu, and X. Chen. Building Association Link Network for Semantic Link on Web Resources. IEEE transactions on automation science and engineering, 8(3):482-494, 2011.

[8] T. Berners-Lee, J. Hendler, and O. Lassila. “The Semantic Web”. Scientific American, 284(5):34-43, 2001.

[9] Yakup Yildirim, Adnan Yazici, and Turgay Yilmaz, “Automatic Semantic Content Extraction in Videos Using a Fuzzy Ontology and Rule-Based Model” , IEEE Transactions On Knowledge And Data Engineering, Vol. 25, No. 1, January 2013

[10] M. Petkovic and W. Jonker, “An Overview of Data Models and Query Languages for Content-Based Video Retrieval,” Proc. Int’l Conf. Advances in Infrastructure for E-Business, Science, and Education on the Internet, Aug. 2000.

[11] M. Petkovic and W. Jonker,“Content-Based Video Retrieval by Integrating Spatio- Temporal and Stochastic Recognition of Events,” Proc. IEEE Int’l Workshop Detection and Recognition of Events in Video, pp. 75-82, 2001.

[12] G.G. Medioni, I. Cohen, F. Bre´mond, S. Hongeng, and R. Nevatia, “Event Detection and Analysis from Video Streams,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 23, no. 8, pp. 873-889, Aug. 2001.

[13] S. Hongeng, R. Nevatia, and F. Bre´mond, “Video-Based Event Recognition: Activity Representation and Probabilistic Recognition Methods,” Computer Vision and Image Understanding, vol. 96, no. 2, pp. 129-162, 2004.

[14] A. Hakeem and M. Shah,“Multiple Agent Event Detection and Representation in Videos,” Proc. 20th Nat’l Conf. Artificial Intelligence (AAAI), pp. 89-94, 2005.

[15] T. Yilmaz, “Object Extraction from Images/Videos Using a Genetic Algorithm Based Approach,” master’s thesis, Computer Eng. Dept., METU, Turkey, 2008.

[16] G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613-620, 1975.

[17] Z. Xu, X. Luo, J. Yu, and W. Xu. Measuring semantic similarity be-tween words by removing noise. Concurrency and Computation: Prac-tice and Experience, 23(18):2496-2510, 2011.

[18] R. Firth. A synopsis of linguistic theory 1930-1955. In Studies in Lin-guistic Analysis. Philological Society: Oxford, 1957.

[19] M. Vojnovic, J. Cruise, D. Gunawardena, and P. Marbach. Ranking and suggesting popular items. IEEE Transactions on Knowledge and Da-ta Engineering, 21(8):1133–1146, 2009.

[20] H. Rubenstein and B. Goodenough. Contextual correlates of synon-ymy. Communications of the ACM, 8(10):627-633, 1965.

[21] M. Steinbach, G. Karypis and V. Kumar. A Comparison of Document Clustering Techniques. KDD Workshop on Text Mining, 2000.

[22] L. Wang and S. Khan. Review of performance metrics for green data centers: a taxonomy study. The journal of supercomputing, 63(3):639-656, 2013.

[23] L. Wang, D. Chen, et al. Towards enabling cyber infrastructure as a service in clouds. Computer & Electrical Engineering, 39(1):3-14, 2013.

[24] L. Wang, J. Tao, et al. G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Future Generation Computer Sys-tems, 29(3):739-750, 2013.

[25] H. Zhuge. Interactive Semantics. Artificial Intelligence, 174:190-204, 2010.

[26] H. Zhuge. The Knowledge Grid -- Toward Cyber-Physical Society, World Scientific Publishing Co., Singapore, 2012. 2nd Edition.

[27] H. Zhuge, X. Chen, X. Sun, and E. Yao. HRing: A structured P2P overlay based on harmonic series. IEEE Transactions on Parallel and Distributed Systems, 19(2):145-158, 2008.

[28] A.P. Pons. Object Prefetching Using Semantic Links. ACM SIGMIS Database, 37(1):97-109, 2006.  

...(download the rest of the essay above)

About this essay:

This essay was submitted to us by a student in order to help you with your studies.

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, . Available from:< > [Accessed 26.05.20].