Kumar et al (2003) have proposed a multimodal system using palmprint and hand geometry. Feature level and score level fusion methods were investigated on the samples, indicating a good performance rate with FRR as 1.4% and 0% for FAR, which are lower compared to unimodal performances.
Snelick et al (2003) have proposed a multimodal system using face and fingerprint. Experiments conducted at score level fusion resulted in a low FRR of 5.1% using Min-Max method, whereas Jain et al (2005) have tested match score fusion on hand, face and fingerprint with all types of score normalization approaches such as Min-Max, tanh, z-score and MAD. These methods have proved that the normalization do have a major impact on score fusion and all methods have reduced the FRR to a greater extent compared to unimodal and multibiometrics performances without normalization methods.
Rodrigues et al (2009) have proposed two novel fusion schemes such as likelihood ratio based fusion and fuzzy logic that can increase the security of multimodal biometric systems. These methods proved to be more robust against spoof attacks.
Mohamed et al (2011) have presented a multimodal biometric system based on fusion of whole dorsal hand geometry and fingerprints. The right and left NIR dorsal hand geometry shape, right and left index and ring fingerprints are utilized. Feature level fusion was performed on the feature vectors that belong to similar biometric traits and scores were normalized using Min- Max method. The experimental results obtained with this method proved to be more suitable for high security level tasks.
Wang et al (2012) have proposed a multimodal biometric system where face and iris features are fused at the feature level to improve the performance of authentication. Fisher discriminant analysis was used in this paper to select the distinguishing character of the fused feature to further enhance the performance.Physically uncorrelated traits are expected to enhance the performance of a recognition system. When heterogeneous biometric sources are chosen for experimenting multimodal biometric recognition, database must contain samples of heterogeneous sources are very important. The cost of deploying a Multimodal system is substantially more due to the requirement of new sensors and the development of appropriate user interfaces. If the databases exist with the required sources, time and cost can be saved.
A vast list of biometrics is possibly found in the literature for the choice of biometric traits in building a multimodal database. Few of the existing multimodal databases are BIOSEC, BIOSECURE-ID, MCYT, M2VTS, BIOMET, SDUMLA-HMT, MMU GSPFA and a few with joint efforts. These are developed with certain list of combination of traits and still exists a shortage of multimodal databases developed under real uncontrolled environment. In their absence, many researchers have opted for publicly available unimodal datasets and most of the multimodal databases have been virtually built by combining the samples in random fashion assuming their independence as scores or signals. Fusion scheme will be carried out for identification or verification process, assuming the combined biometric sources belongs to the same person. These issues have motivated us to create a simple, contactless and economical multimodal database (SSNDS) containing the samples of external and internal biometric traits belonging to the same person. External biometric traits such as face, iris, ear and internal biometric trait namely vein pattern has been chosen for study in this thesis work.The purpose of this investigation is to determine the feasibility of identifying an individual, based on single and multiple biometric traits. This research examines various biometric traits in the SSNDS multimodal dataset. Iris, face, ear, palm (Dorsa) vein samples were thus acquired for all the experiments in this Chapter.
Real-time noisy iris dataset is collected and acquired using a digital/mobile camera with minimum resolution of not less than 5Megapixels. The dataset contains various samples of normal condition and with occlusions such as eyelids drooped, wearing contact lens, reflected iris, half closed eye. The subject’s cooperation is essential in sample collection. The success rate of the experiments has extended our objective to include few other external biometric traits such as face and ear, which can be easily acquired unlike iris using a digital camera. The database details and sample images are given in Appendix. Canon IXUS 160 and Honor mobile camera were used for collecting the samples.
Another biometric trait involved in this thesis is the vein pattern. Vein pattern of the palm dorsal surface was acquired from the persons cooperated for acquiring external biometric traits. An simple and economical setup was designed to acquire the vein samples of dorsa palm (Figure 4.2). INTEX WEBCAM IT-LITE-VU was used in the vein image acquisition process. It has 1/7" CMOS sensor with a frame rate of 30fps and its focus distance ranges from 4cm to infinity. The lens view angle is around 54 degrees. The camera produces an image around 15 megapixels. This camera is designed to take images in the visible spectra by blocking out the infrared light using an IR filter. The camera is converted into an IR camera by removing the IR filter and placing a filter for visible light. The best filter for visible light to use is a new negative photographic film, which blocks out visible light and allows Infrared light to pass through the camera.
To view the vein patterns under a near infrared camera, infrared source emitting infrared rays in near-infrared region is needed. This illuminates the underlying vein patterns and can be viewed under the near-infrared camera. About 30 light emitting diodes which emit light rays of near infrared wavelength of the range 700 – 900 nm are used. The vein patterns can be viewed with precision if the light source is around 780nm. Hence, 30 infrared LEDs are connected serially in a breadboard powered by 18V battery source. A black background is chosen to improve the perspective of the acquired hand images. The camera is mounted horizontally parallel to the base on which the hand is placed at a height of 34cm. The array of infrared LEDs emits infrared rays in all directions.
In order to regulate the amount of light falling on the hand, a breadboard is placed in an angle of 60 degrees to the platform on which the camera is mounted. The hand is placed on a slope at an angle of 50 degrees to the base to provide acute focus on our region of interest which includes the knuckle tips and the surface of the dorsal palm. The setup was arranged as described above and the hand is placed on the horizontal slope at the base for capturing image. The hand is held steady with minimum motion for better experimental results. The thumb finger is placed inside and the hand is folded into a fist with optimal pressure to enhance the visibility of the vein structures in the acquired images. The images acquired from the web camera are of dimensions 640×480 pixels.
There are several reasons behind the selection of the above mentioned traits as a biometric modality to evaluate our experiments.
(a) In most of the daily applications, authentication of an individual is done through one of the modalities.
(b) Acquiring biometric data is simple and easy, since the cost of the sensors used for data collection is very low and does not need any special sensors. A few additional electronic accessories such as breadboard, wires, LEDs, film negative are required for vein acquisition alone.
4.5 PREPROCESSING
The acquired input sample of the biometric trait contains noises, which are removed using median filter and Gaussian filter. Contrast is adjusted using scattering model and dark prior channel. The real time input images might be affected by the outside atmospheric dust like smoke, mist or fog, it has been handled by the dark channel priorities and a scattering model (Koschmieder 1925; Fattal 2008) which is represented as,
I(x)= t(x)J(x)+ (1-t(x) )A (4.1)
I^dark (x)=〖min〗┬(yϵΩ(x))â¡ã€–{ 〖min〗┬(cϵ{R,G,B})â¡ã€– {I^c (y)}〗}〗 (4.2)
where x is the index of the pixel , I is the observed intensity of the haze image, J is the scene radiance without haze, t is the transmission which describes the scene radiance. A is the atmospheric light which is constant. I(x),A, J(x) are all vectors in RGB color space, 〖min〗┬(yϵΩ(x)) is the minimum filter, 〖min〗┬(cϵ{R,G,B}) is the minimum operator to find the minimum values among three color channels and I^c (y) is the R, G, B color channel of the input image. Atmospheric light is estimated by picks the top 0.1 percent brightest pixels from the dark channel of the input image. And the transmission is estimated by using Equation 4.1 and 4.2 as,
t(x)=1-ω min┬(yϵΩ(x))â¡min┬(cϵ{R,G,B})â¡ã€–{(I^c (y))/A^c } 〗 (4.3)
Where A^c is the RGB color channels of the atmospheric light and ω is the constant parameter which is used to maintain a small amount of haze. Then, the transmission is refined by soft matting method (Wang & Feng 2014) which is used to eliminate the artifacts, finally scene J is recovered by using
J^c (x)= (I^c (x)-A^c)/max{t^' (x),t_0^' } +A^c (4.4)
where, t^' (x) is the refined transmission and t_0^' is the lower bound of transmission. Later Discrete Wavelet Transform (DWT) is used for image compression.
The ROI from the biometric traits are to be extracted for further extraction of significant features. The steps for extraction of ROI from each biometric trait are discussed below in brief.
Iris
An iris is an internal organ of the eye, well protected from the environment and stable over time. The first step is to isolate the actual iris region in an eye image by detecting the edges of iris and pupil. The upper and lower parts of iris region are normally occluded by eyelashes and eyelids in most of the samples. And also specular reflections can occur within the iris region corrupting the iris unique pattern. A technique is required to isolate and exclude these artifacts as well as extracting the circular iris region. The steps to extract the iris region have been described in section 2.2.2 and 5.3. The process is summarized as follows:
• Iris is segmented by detecting edge map of iris and pupil using canny edge detector.
• From the estimated edge map, Circular Hough transform detects the centre and radius of iris and pupil boundary.
• Eyelid region is removed by excluding the region surrounding the iris boundary using masking method and the eyelashes are eliminated by threshold.• The extracted circular iris region is unwrapped using Daugman’s rubber sheet model.
The facial region of the given input sample is to be detected for extracting significant features from it. The steps to detect the ROI – face region was described in section 2.3.2 and are summarized as:
• The Rectangle features are detected by
Value = ∑ (pixels in white area) – ∑ (pixels in black area)
• Convert into an integral image for feature computation.
• Compute several features and use Adaboost for essential feature selection.
• Form the Attentional cascade classifier for efficient computational resource allocation.
The steps to extract ROI of a given input ear sample has been described in section 2.4.2 and the summarization is as follows:
• Perform smoothing of input image using averaging filter which blurs and removes the irrelevant information.
• Perform binarization using Otsu’s threshold, which separates the ROI from its background.
• Morphological operation is used for extracting image components, dilation followed by erosion, that are useful in representation and description of region shape using a structuring element.
• Generate mask to remove the skin region and apply it over the image.
• The resulting mask is combined with the binarized form of original image generated earlier to completely eliminate the areas surrounding the ear.
• The resulting image is further subjected to grayscale morphological operations to remove the noise and to fill the gaps in the boundaries of the ear region.
Vein
The steps to extract the ROI from the given palm dorsal input image has been described in 2.5.4 and the summarization is as follows:
• Adjust the contrast between the veins and the background with the help of histogram.
• Isolate the background from the frontal vein structure using clustering or adaptive thresholding.
The section talks briefly about the various feature extraction algorithms which have been used to extract features from the biometric traits involved in this thesis. The detailed description is available in Chapter 2.
Log-Gabor Filter
This filter has been used as a feature extraction algorithm for processing of iris sample. It overcomes some of the traditional disadvantages found in Gabor filters. Log-Gabor filters basically consist of logarithmic transformation of the Gabor domain which eliminates the annoying
DC-component allocated in medium and high-pass filters. The response generated by the Log-Gabor is Gaussian, when it is viewed at frequency scale it appears linear. This helps to capture more information from high frequency areas retaining edge information and prominent features. This topic has been dealt with in section 2.2.3.
Principal Component Analysis (PCA)
Principal component analysis (PCA) has been used as a feature extraction algorithm for face modality. PCA is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. The detailed explanation and the steps are in section 2.3.3.
Extraction of Knuckle points and distance computation
The vein pattern information extracted through adaptive thresholding can be as such considered as the features for matching. In addition, knuckle tips are extracted one after the other by finding the transition between white and black pixel with the help of otsu’s thresholding concept. Next, the Euclidean distance between the tip points are computed. This distance value along with pattern enhances the depth of information. In this section, the investigation of performance of unimodal system is first presented and later the consequences of combining multimodal biometric, under feature level, score level and decision level fusion is presented. In all experiments, the performance of modality is measured in terms of Genuine Acceptance Rate and False Acceptance Rate at 0.1%.
4.7.1 Performance of Single Modality
The performance obtained for every single biometric trait involved in the multimodal database is discussed and presented below. Two samples from each feature have been trained and the rest of samples were used for testing. There are certain parameters to be adjusted that influence the performance of the system. The parameters such as radial resolution and angular resolution used in the normalization phase of iris feature and wavelength and bandwidth of Gabor filter used in feature encoding phase of the respective traits had to be adjusted to give maximum performance. These parameters are selected using the following decidability factor:
d^'=(|μ"s" -μ"D" | )/√(((σ"2s" -σ"2D" ))/2) (4.4)
where d^' is a function of the mean and standard deviation of the intra and inter class comparisons. Higher the decidability, the separation of intra-class and inter-class distributions is greater, which helps to achieve more accurate recognition. The distance measure obtained from each feature is analyzed based on threshold T1 and T2 (Table 4.2) to find out the match decision.
Since there is a shortage of publicly available multimodal databases acquired in unconstrained environment, the majority of the research works in literature is experimented and validated their system on virtually grouped multimodal databases combining the biometric samples from different unimodal databases assuming independence between different biometric traits. There are a few multimodal databases BIOSEC, BIOSECURE-ID, MCYT, M2VTS, BIOMET, SDUMLA-HMT, MMU GSPFA. The combination of biometric traits chosen in our research work does not exist in any of above databases. Hence this motivates us to build a multimodal database, develop and analyze a multimodal system using multiple biometric features belonging to the same individual. The performances of biometric system using multiple modalities have been investigated to analyze the effectiveness of fusion of information from multiple features at feature level, score level and decision level for SSNDS dataset. At feature level, the feature sets extracted from different biometric traits using suitable feature extraction algorithms are combined to form a single feature template or vector. The feature sets extracted from different traits may not be compatible; hence a suitable feature normalization method such as Min-Max, Z-Score, Median and Tanh estimator is applied for normalization during consolidation of features. Min-Max method has been applied for normalization after exhaustive study (Table 4.8). From the Table 4.9, it can be observed that the performance of fusion of two modalities at the feature level with the application of min-max technique has improved the score over the single modality combining the features between iris or face or ear or vein when measured at FAR= 0.1%. Also, it is clear that FAR is inversely proportional to the GAR. When more than two modalities are fused at the feature level, there is only a slight increase in the performance score (~ < 2%), except in an increase of curse of dimensionality and focus on compatibility.At score level, match scores obtained from different biometric matchers are fused to generate a new final score for decision making. Match score is a measure of similarity between the input and the biometric feature templates in the database. The scores obtained from different matchers and various rules, namely Sum rule, Max rule and Min rule and further the weighted sum can be used to normalize and fuse the scores into a single score. Sum and weighted sum have been applied for fusing the scores. The results obtained from combining different modalities at score level fusion are tabulated in Table 4.10.From Table 4.10, it can be observed that the fusion of two modalities at the score level with applying sum rule has a significantly improved score (~ ≥ 5% to 6%) over the single modality when measured at FAR=0.1%. When two modality scores are combined using a sum rule, the performance obtains an accuracy rate of approximately up to 3% which is greater than the same modalities combined at the feature level. Similarly, there is no greater difference in the performance when two or three modalities are combined (not greater than 2%).
When two modality scores are combined using weighted sum rule, the performance improves significantly, which is better than applying sum rule without weights. Through exhaustive experiments, when the weights of 0.45 for face, 0.6 for ear, and 0.4 for iris are assigned for the respective modalities before fusion, the accuracy rates improve greatly from 96.7% to 99.2%, which is a very good performance rate. Similarly, in the case of fusion among face, iris and palm vein, the appropriate weights are assigned such that 0.3 for iris, 0.4 for face and 0.4 for palm vein, the performance rates improved from 95.9% to 98.5%. Therefore, assigning appropriate weight to the modalities hikes the level of contribution in improving the recognition rate compared to other fusion methods.
At decision level, the decisions obtained from individual classifier are fused to obtain a final decision. Every decision module obtains a decision of genuine or imposter user using a rule, namely logical AND, logical OR, majority voting, Bayesian fusion, etc. From these, logical ‘AND’ and ‘OR’ rules have been applied for fusing the decisions and to obtain a final decision. Table 4.11 shows that the fusion of different modalities at decision level has improved the performance rate in comparison to unimodal performance, equivalent to score level fusion and greater than feature level fusion when applying a logical rule. It is observed that there is a much deviation in the performances when logical OR and logical AND were applied for fusing the decision results of different modality. However, depending on the uniqueness of biometric trait, either high or low level unique characteristic traits are selected for fusion, the choice of decision rule has its impact on the performance rate.
Comparing the Tables 4.8, 4.9, 4.10 and 4.11, it is observed that there is definitely an improvement in the performance of multimodal biometric systems than recognition through unimodal biometrics (single modality). It also shows that the score level fusion gives much better results than the other fusion schemes. The impact of fusion in score level and decision level is clearly observed.The performance is better in bimodal (two modalities) than in unimodal, but there is not much difference in the improvement rate between bimodal and trimodal fusion when evaluated at 0.1% FAR. This reveals that trimodal system have improved on a average of 2% GAR at feature level and decision level fusion, and 3% increase in GAR obtained at score level fusion than bimodal biometrics system with SSNDS dataset. When the performances obtained for SSNDS is compared to the results obtained for a database built from two public datasets of iris and ear, score level fusion have resulted in almost similar accuracy rates of 96%, since the individual scores from each trait sample will be fused and the fused total score is utilized for matching. But the feature level fusion of virtually grouped public unimodal datasets has obtained an average difference of 8% in the accuracy rate obtained for SSNDS dataset. Perhaps, there is significant improvement between trimodal and tetramodal biometric system. This shows that as the utilization of biometric traits is increased in number will not quantify the accuracy rates, instead increases the computational complexity.