The ever-increasing volume of medical images, the economic impracticality of manually indexing these images, and the inadequacy of human language alone to describe image contents that are visually recognizable and medically significant, such as shape and geometry, color, texture of objects within images, all provide impetus for research and development toward practical Content Based Image Retrieval (CBIR) systems that could become a standard offering of the medical library of the future, a report by National Library for Medicine (2002). As we all know, the images play a vital role in the process of exchange of information. The well-known saying ‘a picture is worth a thousand words’ highlights the importance of images compared to text in transferring the information. For example, it may take much effort to make others understand the meaning of ‘optical illusion’ using text. But the Figure 1.1 shows the meaning clearly. It is a picture of a beautiful girl and an old hag.
Figure 1.1 Beautiful lady or Old hag
But when it comes to data management, especially information retrieval, text based information are easy to retrieve with accuracy. It is not so easy to retrieve desired images from a large collection of image database with 100% accuracy. In the current age with computers capable of performing tasks millions of times faster than any human can, it may come as a surprise that even for a computer, it is often not easy to find a particular digital object, especially when it comes to imagery.
One reason is that the concrete description of images, the replica of what we see since birth is elusive, hence it is hard to characterize, and even harder to teach a machine. A computer can perform well at tasks for which there is a deterministic algorithm. In addition to this, the number of digital images produced from different sources, increasing at an alarming rate due to exponential increases of computer processing power and digital storage capacity. Muller et al (2004) reported that the number of medical images produced per day in the University Hospital of Geneva was approximately 12,000. Besser & Trant (1995) pointed out that the image capturing and the subsequent manual indexing may account for 90 percent of the total cost of building an image database. Therefore, it is necessary to develop appropriate information systems to manage these images efficiently, especially retrieving relevant images from a large collection of images. In general, two different approaches have been applied to retrieve similar images; one based on textual keywords and another based on content of images. The first retrieval approach is to annotate each image using some keywords and then text-based Database-Management Systems (DBMS) are applied to retrieve it. Several methods were proposed to use keyword annotations to index and retrieve images (Chang & Fu 1980, Chang & Kunii 1980, Dimitroff et al 1988, Lancaster 1998, and Rasmussen 1997). Comprehensive surveys in image retrieval can be found in (Chang & Hsu 1992, Tamura & Yokoya 1984, and Riloff & Hollaar 1996). However, these systems need a previous annotation of the database images, which is time-consuming and very laborious task. Moreover, the annotation process is usually not efficient because of the lack of systematic annotation. In fact, different users may like to use different keywords to describe a same image characteristic. A search engine, thus has to accommodate for all kinds of image interpretations. The lack of systematization in the annotation process affects the retrieval performance of the keyword based search. Moreover, there may be the case where keywords are not at all sufficient to get desired images. For example, a journalist who needs to retrieve an old image containing a familiar scene but does not remember any of the specifics of the image, such as where the image was collected, annotation of the image etc, except that it is similar to a recent image. In this case, a possible solution is to use the recently viewed image as the query image and search for similar images without knowing if any already exist in the database. It is not advisable to search every image manually to get the desired one as it is time consuming. In this context, CBIR plays a significant role in solving many of the issues connected with key word search. CBIR is a technology that compares source and target images by their visual content so that it can automatically retrieve all similar image corresponds to the query image with minimal human intervention (Feng et al 2003, Eakins 2002). The term ‘content’ in this context refers to colors, shapes, textures, or any other information that can be extracted from the image itself. Due to its very large potential applications, CBIR has been paid a great amount of attention in the last decade and a review of early systems is presented in (Smeulders et al 2000). Many research and commercial CBIR systems have been developed, including QBIC (Flickher et al 1995), MARS (Rui et al 1997), Virage (Amarnath & Ramesh 1997), Photobook (Pentland et al 1996), Visual Seek (Smith & Chang 1996), PicToSeek (Gevers & Smeulders 2000), PicHunter (Cox et al 2000), Netra (Ma & Manjunath 1997) and SIMPLIcity (Wang et al 2001).
THREE STAGES OF CBIR
Retrieving desired images from a large image database might involve a search for images with a specific pattern or specific types of object or scene. Accordingly, the query types are classified into three stages of increasing complexity (Eakins 1998).
Stage 1: The primitive features such as texture, shape, color, spatial location of image elements, etc. are used to locate similar images.
Retrieve all pictures containing green and blue regions (color feature)
Retrieve images with similar texture regions as that of a given tile (retrieval by texture)
Find drawings similar to a given shape (retrieval by shape)
Find images containing blue stars arranged in a ring (combination of the primitive features)
Stage 2: The derived attributes involving logical inference about the identity of the objects depicted in the image plays a key role in this level.
Retrieve pictures of a double-decker bus crossing a bridge
Stage 3: It is based on attributes that involves a high degree of abstraction
Retrieval of types of activity, named events or pictures with emotional or symbolic significance (pictures of Mohiniyattom)
1.3 CHALLENGES OF CBIR
Even though the CBIR techniques has got many advantages compared to text based approach, these techniques are challenged by various factors like image resolution, intra-image illumination variations, non-homogeneity of intra and inter regions textures etc. The other major challenges are described as semantic gap and sensory gap in the literature.
Sensory Gap (Deserno et al 2008)
The sensory gap is the gap between the information obtained from a recording of that scene and object in the real world. Therefore, the explicit representation of the knowledge of the domain is essential to lessen the sensory gap.
Semantic Gap (Deserno et al 2008)
It is a gap between the mapping of extracting features and human perceived semantics. It occurs due to the mismatches between the information that one can extract from the visual data and the interpretation by the user on the same data.
1.4 BASIC CONCEPTS
1.4.1 Architecture of CBIR systems
A typical CBIR system has two parts, (1) off-line feature extraction, and (2) online image retrieval. A conceptual framework for CBIR is shown in Figure 1.2. In offline feature extraction, the contents of the images in the database are extracted and represented in a multi-dimensional feature vector called descriptor. In online image retrieval, the user is able to submit a query image to the retrieval system in search of desired images. The similarity is calculated between the query image and database images based on the distance value of the feature vectors. An efficient way of searching the image database is done based on suitable indexing scheme. The system returns the images that are most similar to the query image after ranking the search results.
Figure 1.2 A conceptual framework for CBIR
Based on the user’s feedback with the search results, the user can provide relevant feedback to the retrieval system, which contains a mechanism to learn the user’s requirements. In short, a typical CBIR solution focuses on the construction of an image descriptor, which is characterized by (1) feature extraction algorithm to extract relevant image features from the image, and (2) a proper similarity measure to compare two images. The following sections will describe each component of the system.
1.4.2 Feature vectors & Image descriptors
An image ( ) is considered as a pair where f is a function from a finite set of pixels to n dimensional space . For example, in an RGB system, f is a function from a finite set of pixels to 3 dimensional space . A feature vector vI of an image can be represented as a point in an n dimensional space . These feature vectors are the indicator of the image properties, such as color, shape, and texture. An image descriptor D is defined as an ordered pair , where the function extracts a feature vector vI from an image I and the similarity function computes the similarity between two images which is inversely proportional to the distance between their corresponding feature vectors. The color descriptor uses a three-dimensional representation of image features which has the superior discriminating potentiality compared to the single dimensional domain of gray-level. Shape descriptors often carry semantic information which is identified from the shape of the objects. The texture descriptors represent the homogenous texture regions quantitatively.
1.4.3 Dimensionality Reduction
In order to capture useful contents of an image that discriminate similar and dissimilar images, a CBIR system may extract a large number of features from the content of an image. This will cause "curse of dimensionality" problem in which the computational cost of the query image increase exponentially with the input size. Principal Component Analysis (PCA) is used for the reduction of the dimensionality of a large feature set obtained from an image. The PCA aims to specify as much variance as possible with the smallest number of variables (Egecioglu et al 2004, Patridge & Calvo 1998). The other method for dimensionality reduction is Random Projection (RP) in which the high-dimensional data is converted to a lower-dimensional subspace using a random matrix whose columns have unit lengths (Bingham & Mannila 2001).
1.4.4 Similarity Measure
The similarity measure is a matching function, which is inversely proportional to the distance, that is, the larger the distance value, the less similar the images are. For a given pair of images, it gives the degree of similarity based on their feature vectors. Selection of similarity metrics has a direct impact on the performance of CBIR. The kind of measurement that will be used to compare their similarity depends on the kind of feature vectors selected.The similarity between descriptors is determined by calculating the distance between their points in a multi-dimensional metric space. The distance between the points x = {x1, x2…xn} and y = {y1, y2,’ yn} under various metrics are given below
Manhattan metric/city block :
Euclidean :
Minkowsky:
Canberra :
Chebychev :
For discrete probability distributions p and q over the same domain X, Bhattacharya distance is defined as: where is the Bhattacharyya coefficient
1.4.5 Relevance Feedback (Liu et al 2007)
The main idea of relevance feedback is to improve the performance of the retrieval system by incorporating the user’s feedback. For a given query, the retrieval system identifies similar images based on pre-defined similarity metrics. Then, the user is giving the feedback by selecting positive and negative examples to a given query. The system subsequently analyses the user’s feedback using a learning algorithm and returns refined results.
1.4.6 Performance Evaluation
The performance of the retrieval system is evaluated using the following measures (Muller et al 2001, Cho et al 2011).
Precision and Recall (P- R) graph.
Precision=(No of relevant images retrieved)/(Total no of images retrieved)
Recall=(No of relevant images retrieved)/(Total no of relevant images)
P (20), P (50) and P (NR): Precision after 20, 50 and NR images are retrieved, NR is the number of relevant images
Rank and Average rank: Rank is the index at which first relevant image is retrieved. Average rank, is the mean of the rank in retrieving all relevant images.
Normalized average rank = 1/’NN’_R (‘_(i=1)^NR’R_i -(N_R (N_R-1))/2), where NR represents the number of relevant images and Ri represents the rank at which the ith relevant image is retrieved.
Sensitivity = TP/(TP+FN)
Specificity = TN/(TN+FP)
Accuracy = (TP+TN)/(TP+TN+FP+FN) ,
TP (True Positives) ‘ correctly classified positive cases,
TN (True Negative) ‘ correctly classified negative cases,
FP (False Positives) ‘ incorrectly classified negative cases, and
FN (False Negative) ‘ incorrectly classified positive cases
A simple average rank is difficult to interpret since it depends on both the collection size N and the number of relevant images NR for a given query. Hence normalized average rank is used (Muller et al 2001).
1.5 CONTENT BASED MEDICAL IMAGE RETRIEVAL
As a result of advances in the internet and various imaging technologies, the volume of images produced in medical domain also increases drastically. Even though DICOM (Digital Imaging and Communications in Medicine), a standard for image communication has been set and patient information can be stored with the actual images, most access to these systems are based on the patient identification or study characteristics. The aim of medical management systems is to deliver the needed information to the right persons at the right time in order to improve the quality and efficiency of care processes (Winter & Haux 1995). In order to achieve such a goal, a query by patient name, series ID or study ID for images is not sufficient. The integration of content based methods into PACS (Picture Archiving and Communication systems) would ease to manage large image repository in an efficient way (Muller et al 2004, Lehmann et al 2003). Patient-to-patient search, which can compare multiple patients and retrieve relevant cases among them, should especially help the expert in diagnosis of diseases. The similar cases of another patient in the form of images would help the doctor to take an accurate decision whenever there is a doubtful case. The retrieval of desired images may also be used as a training tool for medical students and inhabitants, follow-up studies, and for research purposes.
1.6 MAGNETIC RESONANCE IMAGING
A brief introduction to Magnetic Resonance Imaging (MRI) is given in this section. It is referred to the works of Omer et al (2008), Brown & Semelka (2003) and Nishimura (2010) for a deeper study of MRI physical principles. A part of the material of this section has been extracted from these texts.
MRI is based on the phenomenon of Nuclear Magnetic Resonance (NMR or MR), in which the signal produced by the protons of tissue water to obtain vivid depictions of the internal macroscopic anatomy of soft tissues. It was first described and measured by Rabi et al (1938) and later the technique was expanded by Bloch et al (1946) and Purcell et al (1946). Bloch and Purcell shared the Nobel prize in Physics in 1952, and Rabi was awarded the Nobel prize in physics in 1944 for their discovery of NMR. Although the physical phenomenon of Nuclear Magnetic Resonance (NMR) has been known since the early 1940s, its practical application to the field of medical imaging was only realized in 1973 when Lauterbur (1973) made the first NMR image by introducing gradients in the magnetic field. Peter Mansfield presented the mathematical theory for fast scanning and image reconstruction with the focus of how extremely rapid imaging could be obtained from very fast gradient variations. Lauterbur and Mansfield shared the Nobel Prize in Medicine or Physiology in 2003.
Unlike the other imaging modalities like Computer Tomography (CT), Single Photon Emission Computed Tomography (SPECT), and Position Emission Tomography (PET), there is no ionizing radiation involved in MRI as it operates at Radio-Frequency (RF) range. MRI is also capable of producing three-dimensional volumetric images, and is able to produce images at any orientation. But, CT is limited to axial slices, other orientations are possible only through post processing interpolation. Moreover, the information content is extremely rich in MR images compared to other imaging modalities. Image pixel intensities generally depend on various intrinsic properties of the tissue. Hence, superb images can be obtained by suppressing or enhancing the effects of the desired parameters in terms of anatomical, functional, and molecular imaging similar to all imaging systems (Omer et al 2008). MRI can be divided into three steps 1) signal generation 2) detection and 3) reconstruction.
1.6.1 Signal generation
Subatomic particles like electrons, protons and neutrons are associated with ‘spin’-fundamental property like charge or mass. In the case of nuclei with an even number of protons and neutrons, individual spins are paired and the overall spin becomes zero. So only atoms with an odd number of protons or neutrons will have spin. The spinning mass of the proton generates an angular momentum J. The electric charge on the surface of the proton creates a small current loop, which generates magnetic moment ??. Both ?? and J are vectors that point along the spin axis and whose direction is given by the right hand rule (Figure 1.3). The hydrogen nucleus is mostly used as the signal source in MRI as the compositions of our bodies are 70% water. The 1H nucleus, consisting of a single proton with a spin of 1/2, is the most commonly used MR active nuclei for probing the human body because of its availability and its response to an applied magnetic field (Brown & Semelka 2003, Nishimura 2010). A nucleus with a nonzero spin rotates around its own axis. The electrical charge associated with nucleus creates a magnetic field around the nucleus, according to Faraday’s law of induction. However, the direction of the field is random under normal conditions, because of thermal random motion resulting in zero net magnetization (Figure 1.4).
Figure 1.3 Direction of angular momentum and magnetic momentum w.r t spin axis
In order to eliminate the effects of thermal random motion, a strong external magnetic field, B0, has to be applied which creates coherence or bulk magnetization, M. The nucleus will precess at a frequency ?? which is proportional to the strength of the magnetic field and is called the Larmor frequency. The relation between ?? and B0 is given in the Larmor equation as:
?? = ‘B0 (1.1)
where ‘ is the gyro magnetic ratio constant.
Figure 1.4 a) Random alignment of hydrogen nucleus b) alignment in the presence of external magnetic field
This coherence alone is not sufficient to generate a detectable signal, as static magnetic fields do not generate any signal. The stable configuration of the magnetization can be disturbed by the addition of a second magnetic field (B1) that oscillates in time in coherence with the nuclei. The oscillation frequency of B1 (RF excitation), must be very near to the Larmor frequency. The purpose of RF excitation is 1) Disturb the equilibrium (static) condition generated by the magnetic field B0. 2) Create a phase coherence in the transverse plane, and generate an output signal to be measured. The magnetic field B1(t) is applied perpendicular to B0 for a short instant that falls into the RF range, and hence is generally called an RF pulse. A typical RF pulse takes the following form B1 (t) = Be1(t) e^(-i(w_0 t+’)) , where Be1(t) is the envelope of the pulse, w0 is the excitation carrier frequency, and ‘ is the initial phase, generally assumes to be zero (Omer et al 2008).
1.6.2 Signal Detection: Relaxation
To obtain an MR signal, an RF pulse tuned to the Larmor frequency of the spins is applied to perturb a magnetized spin system from its equilibrium condition. After excitation, the net magnetization relaxes back to its equilibrium state. This process is known as relaxation or free induction decay, which was first discovered by Bloch (1946). The time constant characterizing the decay of the transverse magnetization Mxy is called transverse or T2 relaxation (spin-spin relaxation), whereas the recovery of the longitudinal component Mz is called longitudinal or T1 relaxation (spin-lattice relaxation). The Bloch equation 1.2 describes the magnetization process which accounts the T1 and T2 relaxations, where M and B are the vector forms of the magnetization and the magnetic field, respectively, and i, j, and k are unit vectors along x, y, z respectively
dM/dt=M’ ??B-(M_x i+M_y j)/T_2 -((M_z i+M_0)k)/T_1 (1.2)
In the Bloch equation, the cross product term describes the professional behavior, whereas the relaxation terms describe the exponential behavior of the transverse and longitudinal magnetization components. Although the precession does not alter the magnitude of the magnetization vector, the relaxation processes do (Omer et al 2008).
1.6.3 Longitudinal Relaxation
The longitudinal relaxation process is governed by
(dM_z)/dt=(M_z-M_o)/T_1 (1.3) where T1 is the spin- lattice time constant, and characterizes the return to equilibrium along the direction of the B0 field. The solution of the equation (1.3) is given by
M_z=M_0+(M_z (0)-M_0 ) e^((-t)/T_1 ) (1.4)
Following a 900 excitation, M_z (0)=0, the equation (1.4) becomes Mz where T1 is a field-strength- dependent parameter and is a indicator of the amount of energy exchanged between the nuclei and the surrounding lattice. Randomly fluctuating magnetic dipoles between the different energy states shortens the T1 and helps longitudinal relaxation. T1 increases with increasing B0 as greater energy exchange is required at higher frequencies to switch between the states. Figure 1.5 shows the time taken for the 63% of the recovery of magnetization.
Figure 1.5 T1 is the time (usually expressed in milliseconds) needed for 63% of the recovery of magnetization along the B0 to be completed.
1.6.4 Transverse Relaxation
The transverse component of magnetization is given by
(dM_xy)/dt=-M_xy/T_2 (1.5)
where T2 is the spin-spin time constant and describes the decay of the transverse magnetization. In longitudinal relaxation, fluctuating magnetic dipoles with xy component at the spin resonant frequency are responsible for T1 relaxation. As in transverse relaxation, in addition to xy component fluctuations, z component fluctuations also account for T2 relaxation, T2 is greater than T1. Furthermore, z component fluctuations often dominate T2 relaxation; hence T2 is largely independent of field strength (Omer et al 2008). Figure 1.6 shows the time taken for the 63% of the decay of magnetization
Figure 1.6 T2 is the time (usually expressed in milliseconds) needed for 63% of the decay of magnetization in the plane perpendicular to B0 to be completed
1.6.5 Image Reconstruction
The excited spins acting as RF sources have a distribution m (x, y), which we wish to image. As spins persuade an EMF in the receiver coil through precession, gradients and are applied to encode the spatial information in the FID signal. As the receiver coil encompasses the entire region of interest, the received signal will be
(1.6)
where is the duration that is turned on. The baseband signal S (t) extracted by ignoring the high frequency factor is given by
(1.7)
where and . Once the signal is recorded, the inverse Fourier transform gives the image or spin distribution (Omer et al 2008).
Brain Axial Anatomy (Jeffrey et al 2008)
The anatomy of axial brain slices is shown in Figure 1.7.
Fourth Ventricle is flattened cum a diamond shaped cavity of the hind – brain and it has the cerebro-spinal fluid. The Ventricle is being situated ventral to the cerebellum and dorsal to the Pons and upper half of the medulla its cavity opens below into the central canal and continuous with the spinal cord and also with the cerebral aqueduct through the third ventricle.
Maxillary Sinus is the prevalent accessory sinuses of the nose. It is the pyramidal cavity, located in the body of the maxilla. The superior part of its medial wall has an ostium which communicates with the lower part of the infindibulum; accessory orifice or the second one is present in the middle meatus and is posterior to the first. It appears as a shallow groove on the medial surface of the bone and about the fourth month of fetal life. It does not reach its maximal size only after the second dentition.
Cerebellum which is located dorsal to the pons and medulla and occupies the space between brain stem and occipital lobes of the cerebral cortex. They are connected to the brainstem by three peduncles. This structure does not initiate any voluntary movement. They serve as a super segmental coordinator of muscular activity. In particular, that requires sequential, repetitive, or coordinated movements.
Medulla is direct and upward continuation of the spinal cord through the foramen magnum which is continuous rostrally with the pons. The reorganization of gray and white matter tracts more rostrally within the medulla which is also called a bulb, by its level of the decussating the pyramids. The transverse section of brain stem is entirely different from one through the spinal cord.
Vermis lies between the cerebellar hemispheres and Median region. The median region is continuous but unpaired. Vermis is derived from the Latin word and it means worm because of its shape.
Internal Carotid A ascends from neck postero-lateral, the origin to pharynx, the wall and enters the carotid canal which is on the lower surface of the petrous portion of the temporal bone.
Mamillary Bodies projects prominently on every side of the ventral surface of the posterior hypothalamus by the midline.
Lateral Ventricle: An irregularly shaped cavity which is located within the lower and medial parts of the cerebral hemispheres on the either side of the midline and are separated from each other by a thin median vertical partition, the septum pellicidum. They communicate with the third ventricle via the foramen of Monro.
Thalamus, which is the largest part of the diencephalon, is buried in the cerebral hemispheres. The right and left thalami which are thick is separated by the third ventricle.
Third Ventricle is narrow, vertical and the median cleft between thalami of the two hemispheres. They communicate with the two lateral ventricles by inter-ventricular foramen. It communicates with the fourth ventricle via the cerebral aqueduct.
Putamen, the largest part of the basal ganglia is most rostral part and is located lateral to the head of the caudate nucleus. Being separated from the anterior part of the internal capsule, it is the largest part of the lentiform nucleus which comprises of the putamen and Globus pallidus.
Caudate Nucleus is the elongated mass of gray matter and is closely related to the lateral ventricle. The rostral border of the internal capsule is fused with the putamen of the lentiform nucleus and its tail terminates in close to the amygdala.
Caudate Nucleus is the elongated mass of gray matter and is closely related to the lateral ventricle. The rostral border of the internal capsule is fused with the putamen of the lentiform nucleus and its tail terminates in close to the amygdala.
Genu/Splenium of Corpus Callosum has fascicles of myelinated fibers. The main function of the corpus callosum is the transmission of information between neocortical portions of two hemispheres during the learning process. The genu represents the curved anterior portion while the splenium is the dorsal to the pineal body.
Middle Frontal Gyrus is an extensive convolution that broadens antero-inferiorly from the precentral gyrus and, is bounded by the superior frontal sulcus above and by an inferior frontal sulcus below them.
Falx Cerebri, the name is because of its sickle-like shape. It is a strong arched membrane which extends vertically downward in the longitudinal fissure between two cerebral hemispheres.
Frontal Lobe, which extends from the frontal pole of the brain to the central sulcus, lies mostly in the anterior cranial fossa. Its lower surface is shallowly concave and fits the orbital roof. The lateral sulcus separates it from temporal lobe at some distance behind the frontal pole. The Frontal pole is a prominent fissure.