Pedestrian detection in night vision plays a vital role in the application of Advanced Driver Assistant Systems (ADAS), autonomous vehicles and surveillance. In this paper, a robust approach using Wavelet based saliency for Region of Interest detection and Local Denary signature for pedestrian representation in thermal infrared imagery is proposed. Unlike the state of art approaches that rely on the intensity thresholding for pedestrian detection, this framework depends on Wavelet based saliency map generated by both approximation and detailed sub-bands for candidate regions that are more likely to contain the pedestrians. There may be other regions that have similar thermal characteristics of the pedestrian in infrared image, raising false alarm. Thus, an effective representation with 10 intensity levels termed as Local Denary pattern and the descriptor based on Edge Orientation histogram precisely describe the pedestrian characteristics. Local Denary pattern is a robust and discriminative representation as it accomplishes a better tolerance against gray-scale variation. Finally, Logitboost Ensemble classifier is used to classify pedestrians from background. Experimental results on a publicly available OSU thermal pedestrian database demonstrate the superiority of the proposed approach in detecting pedestrians.
Introduction
The modern developments of smart vehicles and the demand for road safety have ensued incorporating advanced driver assistance systems. Some advanced driver assistance systems in existence include adaptive cruise control, adaptive lighting control, detect the presence of an object within a blind spot, lane-keeping assistance systems, night vision systems etc. Night vision systems designed for pedestrian safety have to be real time, precise during day and night-time and should alerts the driver with an alarm or automatically apply brake to avoid collision. The existing data acquisition system for pedestrian detection includes passive sensor [1-4] with visible or infrared cameras, active sensors [5, 6] and hybrid sensors [7, 8]. Among these available sensors, passive sensors are acceptable as it impersonates human perception and can be used to classify the obstacles. Visible cameras fail to detect objects in the dark without an illuminating source whereas, an infrared sensor perceives the heat signature emitted by the object and creates an electronic image in shades of grey. Numerous challenges existing in the field of pedestrian detection make this an evergreen area of research. The challenges are as follows a) the motion pattern of the pedestrian is unpredictable b) the appearance of the pedestrian varies with the view point, direction, clothing, temperature etc. c) objects other than the pedestrian appearing bright in the IR images may cause false positives d) lack of fine discriminative details of the image makes the pedestrian detection difficult.
Thermal cameras are independent of illumination and different objects are represented in varying brightness level. Pedestrian representation by thermal images generally has high visibility at night as the pedestrian’s temperature is higher than the ambient temperature of the background. Pedestrians can be separated from the background by selecting appropriate threshold as pedestrians are represented by higher intensity values [10, 11]. Selecting appropriate threshold value is a big challenge as pedestrian intensities vary with respect to temperature and distance. While biniarizing the image with static threshold, if the threshold selected is too small it results in unwanted background image classified as pedestrian. On the other hand, when the threshold is selected too high the pedestrian may be fragmented. Dynamic threshold value is derived based on the statistical parameters of the image and proportion to the area under image histogram. Region growing algorithm is used to isolate pedestrian from background [12] here, the region merges when two region of interest in close proximity. State of art techniques focused on extracting region of interest by thresholding assuming the pixel intensities follow Gaussian distribution [13] and histogram analysis [14].
Pedestrian detection is considered as a binary object classification problem where learning methods are employed to classify a candidate object into pedestrian or non- pedestrian. The pedestrian detection framework comprises mining of the region of interest, extracting discriminant features, and feature classification. The classification accuracy relies more on the discriminating features selected for classifying pedestrian from background details rather than depending on the classifiers choice. An assorted collection of feature extraction approaches have been proposed for pedestrian detection in thermal images. Based on the prior model developed with the distinct features of pedestrians and non-pedestrians, the classifier categorizes the candidate region in to pedestrian or non-pedestrian.
Fang et al. [15] selected the candidates by horizontal and vertical projection and classified the candidates using multidimensional histograms, inertia, and contrast-based features. Liu et al. [16] identified the candidate pedestrians by gradient and threshold-based image segmentation and represented the candidates by pyramid entropy weighted histograms of oriented gradients. Gray-level image features [10], Gabor wavelets [17, 18], Haar wavelets [19, 20] and histograms of oriented gradients (HOGs) [21] are the popularly used features for pedestrian detection.Zhang et. al. extracted local features such as edgelets and HOG and classified with AdaBoost cascade classier and cascade of SVM classifiers [22]. Qi et al. represented pedestrian time sparse representation and developed generic dictionary optimized by K-SVD and dictionary with basis atoms and highlighted its importance in Driver assistant system [23]. Riazet. al. [24] used CENsusTransformhISTogram (CENTRIST) features and Support Vector Machines (SVMs) tocapture contour cues. Further, shape descriptors [25,26], such as compactness from the skeleton of the object serve as input to SVM. To detect pedestrian, contour saliency map (CSM) [27-29], CSM template matching [30], shape and appearance-based detection [31, 32], spatiotemporal texture vectors [33] and boosting frameworks [34] are used. The advantage of learning based method for pedestrian detection can detect the pedestrian without a background image and thus background subtraction is evicted.
Kim et al. utilize Convolutional Neural Network (CNN) for nighttime pedestrian detection using visible images [35]. Liu et al.[36] and Wagner et al [37] applied fusion architectures to CNN which fuse the visible channel feature and thermal channel features for multispectral pedestriandetection. Cai et al. [38] generated the candidates using saliency map and used deep belief network as a classifier for vehicle detection in nighttime. John et al. [39] used Fuzzy C-means clustering for generating candidates and CNN for verification ofpedestrian detection in thermal images. Houet. al. [40] experimented effectivestrategies to combine pixel-level fusion methods and CNN-fusion architectures.
Human can recognise pedestrians at faster rate even at night time due to the strong visual significance.Meanwhile, in Far Infrared images pedestrianare usually brighter than their surroundings, which pinpointthe saliency characteristics. Based on this rationale, this work is inspired by the human attention mechanism and saliency model based on wavelet feature map is used to achieve candidate pedestrian region. To separate pedestrians from other background objects, pedestrian classification model withLocal Denary signature and Edge orientation histogram is used to verify the ROI region.
The remainder of this paper is organised as follows: Section 2 describes methodology for pedestrian detection in far Infrared images.Section 3 presents experimental results of the proposed method. Finally, Section 4 gives the concluding remarks of the proposed method.
Proposed Methodology
In this section, the methodology for detecting pedestrians in infrared images using saliency based region of interest extraction and Local Denary description is explained. The block diagram of the proposed method for pedestrian detection in infrared images is shown in Figure 1.
based saliency map generated by fusing feature map from approximation band and detailed sub-bands followed by thresholding. The extracted region of interest may contain artifacts due to illumination changes, shadows etc., and objects other than pedestrian. Here, a pre-learned model is formed using Local Denary pattern followed by Edge Orientation Histogram (EOH) from the database having 5578 pedestrian thermal images and 3375 background thermal images. With this model the detected region of interest was categorized as pedestrian or non-pedestrian using Ensemble based classifier.
Wavelet Decomposition
Wavelet transform has the advantage of providing multi-scale spatial and frequency analysis of the image. When the input image is processed with multi-scale filter banks it results in approximate and detailed sub bands. As, the pedestrian in thermal images appears at different scales, wavelet transform with Daubechies wavelets (db4) is chosen to examine the thermal image to highlight the salient regions.
The multi-scale wavelet decomposition is defined as follows.
{█([A(k),H(k),V(k),D(k)]=WT(I(x,y) k=1@[A(k),H(k),V(k),D(k)]=WT(A(k-1)) k>1)┤ (1)
Where, I(x, y) represent the input image, k is the number of decomposition level, WT (.) represents wavelet decomposition. A(k) is the approximation output representing low frequency information and H(k), V(k) &D(k) denotes the wavelet co-efficients of horizontal, vertical and diagonal detailed information at kth decomposition level. First three levels of wavelet decomposition is shown in Figure 2.
Feature Map Generation using Detailed Sub band: Feature maps are generated by reconstructing the decomposed high-frequency sub bands ignoring approximation sub band. The feature maps are constructed as follows.
f_kd (x,y)=IWT_k (H(k),V(k),D(k)) (2)
Global Saliency Map Construction:The higher the frequency information of the saliency map, the richer is the information of interest. From the generated feature maps, the maximum value across the feature map is used for determining the significance of the pixels. Let, p(i,j,n) be the feature maps generated by the detailed subband, n represents the number of feature maps and (i,j) indicates the coefficient location.
SFM (i,j) = max(p(i,j,1:n)) (3)
SFM(i,j) contains the statistical relation to all the feature maps and can highlight the important information that local contrast cannot detect. Figure 3 shows the feature maps of the global saliency by multi-level reconstruction.
Feature Map Generation using Approximation Sub-band:Here, Feature maps are generated by reconstructing the decomposed approximation sub-band by ignoring detailed sub-bands. The feature maps are constructed as follows.
f_ka (x,y)=IWT_k (A(k)) (4)
Local Denary Pattern
Thermal image representations lag the fine scale textures. Therefore,spatial structures can be used as the discriminating features. Local Denary representation with directional representation is efficient in capturing local structural properties of the pedestrian. Consider the smallest whole unit, i.e., 3 X 3 pixels region having eight directions surrounding the pixel. The nine intensity values of the 3 X 3 pixels region is represented as: {B0, B1, B2, B3, B4, B5, B6, B7, B8}.The local denary signature can be extracted by
D(abs(B_(x±dx,y±dy)-B_(x,y))), (5)
where, (x±dx, y±dy) is the neighbouring pixel and
The image with 8 bit representation has the maximum value of 255 and the maximum absolute difference will be 255. The difference range of (0-255) is fragmented in to 10 parts and assigned 10 representation levels as shown in Table 1.
As the local denary signature have totally 10 levels of representation, the combination of all 8 neighbouring pixels results in 〖10〗^8 possible units in total. It is splitted in to 10 representative units corresponding to each level i.e., the first representation named as L0 is assigned with binary value 1 for D(p)=0 and 0 for other values of D(p), similar pattern is assigned to representations (L0, L1, L2, L3, L4, L5, L6, L7, L8 & L9).
L_j={█(1 D(p)=j@0 D(p)≠j)┤ (6)
The eight bits generated from each representation is put together clockwise and converted to a base 2 number. The Local Denary pattern formed as a result is less sensitive to gray-scale variation. Example of Local Denary Pattern computation is shown in Figure 4. Figure 5 and Figure 6 shows the Local Denary signature of a pedestrian and background image respectively.
Edge orientation histogram describes the local orientation and is of interest in pedestrian classification, because pedestrians often present strong edges in the legs or trunk areas. The edge orientation histogram of all the signatures is concatenated to serve as the feature vector.
Overlapping Pedestrian detection
As a result of Salient region extraction, it is observed that in some cases more than one blob merges and generateserroneous result. The merging may occur in horizontal / vertical direction and, there is a need to split the merged pedestrian blobs. Here, vertical and horizontal projection profiles are used to identify the overlapping pedestrians for the candidate blobs having width to height ratio greater than 0.76 [41].
For an image p(x,y) with mp rows and np columns Vertical Projection Profile (Vp) [42] and Horizontal Projection Profile (Hp) are given by
H_p= ∑_(x=1)^(m_p)▒〖p(x,y)〗 (7)
V_p= ∑_(y=1)^(n_p)▒〖p(x,y)〗 (8)
Figure 7 plots the horizontal and vertical projection profile curve of sample thermal image. By analysing the crest and trough of the horizontal /vertical profile it is evident that the overlapping pedestrians can be easily separated. The profile exhibits trough at boundaries and the location of these minima points mark the boundaries. Similarly, vertical projection profile is used to perform segmentation of overlapping pedestrian in horizontal direction as the valleys are created corresponding to the point of merging. These boundaries can be identified with the help of these minima points.
The algorithm of the proposed foreground extraction is as follows:
Compute the width / height ratio of the candidate ROI as Pwh = lw/lh, where lw and lh represent the width and height of the candidate ROI.
Obtain the intensity vertical projection curves and horizontal projection for Pwh > 0.76.
Detect the trough locations from the projection curves.
The trough location decides the point of separation between the two pedestrians. Separate the overlapping pedestrian in to two pedestrians with the trough location.
Results and Discussion
In this section, the robustness of the proposed Pedestrian detection algorithm evaluated on the benchmarked OSU (Ohio State University) Thermal Pedestrian Database [41] is discussed.
3.1 Dataset used:
The experimental validation of the proposed algorithm is on OSU (Ohio State University) Thermal Pedestrian Database which serves as a benchmark dataset with infrared images. The dataset was acquired from the pathway of Ohio State University campus with the resolution of 360×240. The dataset has images captured under varying environmental conditions like rainy cloudy and normal conditions. Some of the sample images from the dataset are given in Figure 8.
3.2 Region of Interest Extraction:
Sample results showing candidate regions obtained by wavelet based saliency analysis is shown in Figure9. Figure 9 a, a1, a2 & a3 shows the sample input images of class1, class2, class 4 and class 7, respectively taken under different environmental conditions like fair cloudy and light rain. Figure 9 b, b1, b2 & b3 presents feature map generated by the approximation component and it is observed that it retains the low frequency components and smoothen the image and reduce background clutter. The feature map is binarized by thresholding and portrayed in Figure 9 c, c1, c2 &c3.The sample results of the saliency map generated from the detailed subbands is shown in Figure 9 d, d1, d2 &d3. The saliency map generated by detailed subband indicates the distinction between pedestrian and the surrounding background.Thus, the targets under the various complex backgrounds with varying environmental conditions canbe detected accurately and firmly.The feature map is binarized by thresholding and portrayed in Figure 9 e, e1, e2 &e3. The fused output shown in Figure 9 f, f1, f2 & f3 indicates the shape and the structure information of the pedestrian arebetter maintained.Although some scattered interferences may still be introduced, the pseudo targetcan be largely eliminated in the subsequent target identification. In addition, it is conducive to the discrimination process, and the computational speed can be greatly improved because of the low repetition and the smaller amounts of false alarms.
3.3 Person Detection
The salient blobs extracted havepedestrian and non- pedestrian objects having higher saliency characteristics as shown in Figure 9. Hence, a strong classifier built upon candid discriminative features is required to distinguish pedestrian from other background objects. Here, an effective human descriptor with Local Denary pattern with edge orientation histogram is used with ensemble based classifier. For getting the prior knowledge, 5578 pedestrian images and 3375 images without pedestrians were used. Some of the sample pedestrian and background objects are shown in Figure 10. These images are used to build a model for pedestrian / Background classifier. Classification accuracy of 95.05% is reported using logitboost ensemble classifier with Local Denary Pattern and Edge Orientation Histogram.
Table 2: Pedestrian vs. background classification
Table 2 shows the classification results of true pedestrians from the pedestrian detection result using well-known features such as Gray Level Co-occurrence Matrix (GLCM), Histogram of Gradients (HOG), Edge Orientation Histogram (EOH), Centrisit, Local Binary Pattern (LBH), Local Ternary Pattern (LTP), Local Denary Pattern (LDEP), LBP+EOH, LTP+EOH, LBP+HOG, LTP+HOG and LDEP+EOH. Among the used features LDEP + EOH reports higher classification rate and used in further processing.
The obtained prior model with LDEP + EOH used in online classification of salient regions detected from OSU Thermal Pedestrian Database. The pedestrian detection results obtained for somesample IR images are displayed in Figure 11. Figure 11 a, a1, a2, a3, a4 & a5 is input image and is observed that due to the heat radiation, the pole, the car and the tree are also detected in salient region extraction stage thereby initializing a false alarm. Figure 11 b, b1, b2, b3, b4 & b5 displays the detected results. From Figure11b, it is observed thatthree significant regions are detected and classified as pedestriansand marked in green colour, Figure11 b1 has threepedestrians labelled in green colour and an overlapping pedestrian clearly separated and labelled correctly. In Figure 11 b2 a pole detected as salient region is classified as background object and marked in red colour. Similarly a pole, a tree and a carare tagged in red colourwhereas the detected pedestrians are highlighted in green colour in Figure11 b3. Figure11 b5 shows the three pedestrians highlighted in green colour and a pavement tagged in red colour generating false alarm and classifying background information as pedestrian.
The performance of the pedestrian detection method is evaluated using the recall and precision which are tabulated in Table 3. Compared to these features, the LDEP+EOH feature has obtained the higher recall rate and precision rate in all classes which indicates that the proposed method outperforms other features.
Table:3 Classification Accuracy of Different features
Table 4 presents the comparisons between proposed method and other popular pedestrian detection methods and summarizes the comparison results. The Wei Li et al. [43] method combines the HOG features with geometric characteristics and attained the 86.00% recall rate and 95.00% precision rate. Davis et al. [26] method secured the 94.60% recall rate and 99.40% precision rate by performing initial screening with generalized person template derived from Contour Saliency Maps to quickly detect person regions while ignoring most of the background. The hypothesized person regions are then validated with an AdaBoosted ensemble classifier. Zelin et al. [44] method earned the 97.80% recall rate and 99.60% precision rate by mid-level feature descriptor for pedestrian detection in thermal domain, which combines pixel-level Steering Kernel Regression Weights Matrix (SKRWM) with their corresponding covariances. Soundrapandiyan et al.[45] used LoG filter with kurtosis for background suppression, and detected pedestrians using L-moment-based local thresholding and obtained the 99.09 % recall rate and 99.50% precision rate. Further, it can be observed that the proposed method with LDEP + EOH based classifier has the best detection performance from Table 4 as it represents human visual system for region of interest extraction and effective structural descriptor to define pedestrian.
Conclusion
In this paper, a robust method for pedestrian detection in IR images is presented. The background suppression module process image and extracts salient region by fusing the information from the feature maps generated by the approximate and detailed wavelet sub-bands. Wavelets are used as the sparsity basis and the salient blobs are well represented by thresholding and the burden on post processing is reduced. To classify the detected blobs as pedestrian or other background objects in an effective manner a dictionary with LDEP features is used. It is observed that, pedestrian in IR images lack fine details and it is effectively represented by LDEP which is a structural descriptor. The experimental results proved that the proposed method with LDEP+EOH exhibits high detection rate by comparing it with other different feature extraction methods, and other existing pedestrian detection methods.