Motion detection method in IR camera video

Abstract—The research on motion detection methods in IR camera video resulted in a system based on the intensity of regions in two consequent frames. The tests conducted on simulated hot-black video of the heat-emitting objects in various configurations led to the proof of the presumptions. Further tests on the real environment video required the other algorithms’ implementation to cope with various types of object occlusion. In the system, the camera motion compensation is followed by the background image subtraction which conduces to the extraction of the moving objects only.

Keywords—computer vision; movement detection; motion compensation; motion estimation

I. INTRODUCTION

Detection of moving objects has become one of the main problems during the development of many computer vision applications, for example, in surveillance systems and alarm systems, where sensing of activity is a primary objective. Detection of the moving object by UAV is a challenge because all the process must be done from the flying platform. However, UAVs have been originally designed for recon and surveillance purpose with various sensors in a wide area of environmental conditions [1][2][3][4]. A camera which is mounted onboard provides a wider field of view. The system is able to scan an incomparably bigger area at the same time because of their mobility. Thus, development of the objects’ recognition abilities in unmanned aerial vehicles is crucial for surveillance systems improvement. Moreover, the system has been designed to use video from a single camera only which is the most common solution in drones. Single camera picture analysis is a considerably different issue then stereovision processing [5][6].

A. Overview of the movement detection methods

Movement detection methods can be generally divided into four main categories [7]: background subtraction method, sparse feature tracking methods, background modeling techniques and robot motion models.

Background subtraction method that separate moving objects from the background are based on simple frame difference between two consequent frames [8][9]. This method is mostly used in stationary camera application and computationally efficient because of its simplicity [10]. However, in the case of a camera mounted on a moving platform, additional steps must be implemented, i.e. motion compensation and pattern classification as explained in [9].

Solutions using background modeling can incorporate illumination changes and scene changes [11]. In the case of the background image is available the result obtained is a simple difference between the frame and background image, thus equivalent to background subtraction method. However, these methods are not adequate in a situation when the camera is moving fast, and the scenery is changing rapidly [12] even if learning background changes algorithms are implemented.

Feature tracking methods employ computing sparse features in an extracted patch of the image to detect objects and further extract their motion vectors in two consequent frames [12]. Low-level local features represent objects in the image and are used to compare images in the set of frames.

Robot motion models, by simplifying ego-motion of the camera, detect moving object based on objects violating camera motion model. For example, the method proposed in [13] restricts platform movement to 1 DOF.

B. Movement detection challenges

Detection of moving objects is an extremely challenging problem [14]. Detection accuracy is affected by a number of practical challenges, some of them are:

· loss of the information caused by the projection of 3D world on a 2D image,

· noise in the image,

· complex object motion,

· nonrigid or articulated nature of the objects,

· partial and full object occlusions,

· complex object shapes,

· scene illumination,

· changes,

· real-time processing requirements.

For this reason, most of the approaches to the problem are hybrid in their nature and many, in order to lower false detection rate, use objects classification techniques as one of steps in the solution [15]. Besides filtering of noise in the image by morphological dilation and closing, preprocessing of the image varies depending on the method employed, and may include normalization of the image, gray scaling, resampling, etc.

The rest of the article is organized as follows. Section II describes the proposed method and Section III presents experimental results. The article is ended with conclusions and future work outlook.

II. PROPOSED METHOD

As pointed in [16] there is no single algorithm that works against all the challenges listed in the previous Section. Therefore, some assumptions specific to the scenario must be made before the development of the method. In the case of the presented method the following assumptions were made [17]:

· the source of the image is an LWIR black-hot camera,

· the camera is mounted on a moving platform (UAV),

· moving objects of interest emit heat thus appear black on the image,

· the transformation between frames is planar (the altitude of UAV is high).

Such apriori assumptions simplify detection process removing some of the challenges, mainly complex object shape issue, scene illumination changes, and allow the use of IR imagery intensity thresholding as one of the main steps in the approach. With such assumptions, the problem can be considered as a preliminary exploration of challenging movement detection field in computer vision.

A. Overview of the approach

The pipeline of the approach is shown in Figure 1a. The input data are images captured by LWIR camera mounted on UAV. After the images are acquired the motion estimation is carried out. For the motion estimation, two consequent frames are needed. The step of motion estimation includes feature extraction and transformation estimation between two consequent frames. The following step is motion compensation, which is employing background subtraction and can be thought of as overlaying two consequent frames with remaining unmatched points: moving objects and noise. The last step of the moving objects’ detection is the object classification. Because of apriori assumption, that objects of interest emit heat, this step employs intensity threshold.

Fig. 1a. Overview of the approach

The presented method is of a hybrid nature, using background subtraction method and feature tracking method. The difference between the proposed method and optical flow method presented in [18] is that it is possible to detect objects moving towards or opposite to camera, even though such scenario would be hard to meet in a real application. The difference between methods using both background subtraction and feature tracking [19] is that this method additionally relies on intensity thresholding to reduce the number of outliers during the platform movement estimation.

B. Motion Estimation

After the frame is acquired the features are extracted from it. The features are extracted only from areas in which pixels’ intensity is lower than the threshold because of the assumption that the non-emitting heat areas are static. Areas emitting heat are not taken into consideration thus simplifying the image matching process by reducing the number of outliers.

 

Where: is an intensity of the point in the image, is intensity threshold corresponding to area emitting heat.

The approach to sense the appropriate points which match the frames is feature-based, therefore for the image matching step a reliable and distinctive descriptor is needed. In this case, points are chosen using the Harris corner method. Harris detector does not change with the affine intensity and rotation alteration, but it changes with image scale. Therefore, if the camera is zoomed in or out between two consequent frames, the results will be unreliable.

To obtain transformation estimation of two consequent frames, extracted local features are used. Because compared frames are related by translation square window, the descriptor is implemented and used to find matches in the preceding frame.

However, the Harris corner detector is not suitable if the scene in the frame is relatively consistent in its intensity. Therefore, for the experiment section, where the simulated environment is a homogeneous and low number of corner features would be detected in each frame, SIFT features were computed and matched instead [20]. Detected SIFT features are shown marked by the blue circles in Figure 2. Additionally, the key points are matched in two consecutive frames as marked by the green lines. Fast Library of Approximate Nearest Neighbors (FLANN) has been used for matching. The solution was tested in the research [21]

The key step of motion estimation is estimating geometric transformation. Geometric transformation is assumed to be an affine transformation and requires at least three noncollinear points. Projection of a point in the image can be described as:



Where ( are the coordinates of a point in the consequent frame corresponding to point

Fig. 2. SIFT features detected and matched in two consecutive frames.

Fig. 3. a) b) two consequent frames; c) the result of overlaying difference.

Fig. 4. a) result of motion compensated frame difference; b) intensity threshold binary frame; c) result of applying the binary mask to frame difference – logical AND operation – red pixels mean logical 1

C. Motion Compensation

Basing on the transformation obtained in the motion estimation step the frames are overlaid over each other. This step can be compared to frames stitching. However, the practical difference between frames stitching and motion compensation, in this case, is that whereas image stitching supplies information, the motion compensation restricts information. There are regions near the margin of the frame that cannot be taken into consideration when detecting objects because there are naturally no matching features there.

After the motion is compensated, the difference between frames is computed. The absolute differential image is computed using the formula:

Motion compensated difference, as a result of motion estimation and motion compensation steps, is shown in Figure 3. The difference between frames is potential objects of interest and noise.

D. Object Classification

Because of the assumption that objects to be detected emit heat, the intensity threshold is applied to motion compensated difference image as in (1) but the inequality is reversed:

 

Furthermore, the area of detected objects is computed based on contours detected by Canny’s Edge Detector. Only the biggest objects are classified as objects of interest which eliminates the problem of noise in difference image (Figure 4).

The last step is the computation of bounding-boxes to visualize results of the algorithm.

If objects of interest were assumed specifically to be vehicles or humans additional object classification based on features of objects would be required.

This solution can be regarded as a background subtraction method and is suitable for detecting big and fast objects, e.g. vehicles. For object tracking, additional steps of feature extraction of these objects and matching would be required.

III. DEVELOPMENT AND EXPERIMENT

The proposed method was developed in Python programming language using OpenCV programming library.

To the author’s best knowledge, there is no globally accepted pattern IR camera dataset to test against. Therefore, at first, the method was tested on recordings of a camera mounted on UAV simulated in Prepar3D software. Sensor configuration provided in Prepar3D software allowed to create custom sensor – black-hot IR camera. The type of simulated camera is pan-tilt-zoom camera mounted under the platform.

Generating dataset using Prepar3D software additionally allowed to simulate different density of the traffic and simulate both land and water vehicles movement.

The method was tested against low, medium and high traffic density scenarios. In the case of low density presented method successfully detected two moving objects in two consequent frames, as shown in Figure 5.

Vehicles were also successfully detected in the case of medium and high traffic density as shown in Figure 6. and Figure 7 respectively. However, if the vehicles are grouped close together in dense traffic, the proposed method does not provide instances as shown in Figure 8. It is because those vehicles appear to be connected regions in the frame. This is an expected result if there is no object classification and tracking implemented.

The results suggest that vehicles detection does not depend on the density of the traffic.

Because the experiment video was simulated, there was no chance to test the method against occlusions of objects. Moreover, because of the lack of globally accepted dataset, it is difficult to compare this method to the other methods.

Fig. 5. Low traffic density a) First frame. b) Second frame. c) The result of the proposed method. Detected objects are inside white bounding-boxes.

Fig. 6. Detecting vehicles in medium traffic conditions.

Fig. 7. Detecting vehicles in heavy traffic conditions.

Fig. 8. Detecting vehicles in dense traffic. The method does not provide instances.

The proposed method was also tested against different environments. In the frames where the environment lacks distinct feature vehicles cannot be successfully detected since motion estimation quality is heavily affected. Such situation is visible in Figure 9 and Figure 10: the presence of the land near the water in Figure 9 is sufficient for detecting moving boat, however, the homogenous environment lacks any distinct features in Figure 10 thus making it impossible to detect moving vehicles.

Fig. 9. Detecting vehicles in an environment with distinct features (land)

Fig. 10. Detecting vehicles in a homogeneous environment (water).

IV. METHOD MODIFICATION FOR THERMAL IR

For further evaluation of the proposed background, subtraction method tests on real image datasets were performed. Image sequences used for the evaluation were thermal IR data collected during DARPA VIVID program (PkTest01, PkTest02, PkTest03) [22].

However, as for the camera type difference, the method required modifications to meet the new requirements. Datasets used were different from the previously simulated data, mainly in the following aspects:

· heat-emitting objects in the dataset cannot be simply discriminated based on the intensity, appearance of some objects shown in Fig. 11

· in addition to noise camera, auto-gain issues are present

· occlusion and partial occlusion by trees are present

· passing through the shadows is present

· small image resolution 320×256 [px]

Occlusion and shadow presence was useful in the further evaluation of the proposed method, as these challenges were not present in the simulated dataset.

Fig. 11. Different appearance of heat-emitting objects in dataset applied for evaluation.

A. Modifications to motion estimation

Motion estimation quality naturally depends heavily on outliers’ rejection. In this scenario, outliers are mainly objects moving in the scene. As for the originally proposed motion estimation step, outliers were assumed to be rejected before the feature extraction, as only regions above the threshold were processed. In the case of the dataset used for further evaluation, no assumption that regions below or above certain intensity threshold were static could be made, so new outlier rejection method must have been implemented. Therefore, the first modification to motion estimation is to compute affine transformation using the RANSAC algorithm, which has been widely applied for excluding outliers.

As for the matching features step, SIFT features are matched between the frames using BF matcher. However, because motion compensated images contained a big amount of noise, as shown in Fig. 12 in which the scene-static objects are present, ASIFT was also considered in the early stage of modifying the method to account for the skew between the frames. It was found that in this scenario, ASIFT did not improve moving object detection although the algorithm reducing the noise in motion compensated images was implemented. The main reason for the low quality of motion estimation was the low resolution of the data. Considering the disproportion between detection improvement and complexity of ASIFT algorithms, the SIFT method was used for further evaluation only.

B. Modifications to the motion compensation

In the motion compensation stage, just as in the previous stage, subtraction of the transformed frames is carried out. The only modification is the operations employed to process the resulting image. As shown in Fig. 12 in thermal IR images the scene-static regions mainly constitute to the noise, pointing to both motion estimation issues and auto-gain issues. Therefore, no denoising filters are used as they would make the potential objects of interest less distinctive in the processed image. Only simple morphological operations such as eroding and dilating are applied to eliminate some of the noise and to connect parts of the same object that appear disconnected because the single object does not appear as single intensity region in the image.

C. Modifications to object detection

The assumption that the heat-emitting objects can be easily detected by thresholding the frame had failed during further evaluation with the real IR camera pictures datasets. Therefore, another modification must have been made to detect moving objects.

Although object classification, in this case, could easily increase the accuracy of the method on datasets used for further evaluation, it has not been implemented. The main reason is the assumption to detect all moving objects, not only land vehicles, even though datasets consist of moving objects that are consistent in the appearance.

Because of the big amount of noise in the motion compensated image, object detection became the crucial stage of the modified method. At the same time, no assumption about the appearance of the object could be made. Intensity thresholding that applied for heat-emitting objects could be no longer applied for the dataset. Binary AND of the subtraction image and the thresholded consecutive frame was changed to binary AND of the subtraction image and spatial saliency map of the consecutive frame. For this purpose, spectral residual saliency as described by [23] was used.

Fig. 12. Subtracted frames with noise. PSNR = 26 dB

&& =

Fig. 13. Detecting heat-emitting objects by bitwise ANDing motion compensated image and saliency map of the current frame.

Fig. 14. Detection of the moving vehicle at the intersection – one True Positive and three True Negatives.

Fig. 15. Detection of moving vehicle – visible to partially occluded.

Fig. 16. Detection of moving vehicle – partially to partially occluded.

Fig. 17. Failed detection of movement due to passing through shadow.

V. FURTHER EVALUATION

The dataset used for evaluation consisted of three sequences of road traffic taken by thermal IR camera mounted on UAV flying on various altitudes allowing to test method against different scales and camera views. Inland traffic, the occlusions are common due to the road infrastructure, especially if the direction of the camera is not vertical. Acquired datasets allowed to test the proposed method against occlusion and partial occlusion of the object, especially occlusion by trees which is of frequent occurrence. Moreover, local and global illuminations are present in the sequences, such as shadows and autogain issues.

Different altitudes and camera views in the dataset did not influence the proposed method effectivity. Vehicles in sequences occasionally pause at the intersection. This allowed to confirm that method detects only moving vehicles as shown in Fig. 14. As for the other issues, the influence on the effectivity is described thoroughly below.

A. Occlusion

Motion detection is different than tracking in that it cannot predict object future location. Therefore, the detection rate was measured in cases of the frames where occlusion to partial occlusion and opposite or partial occlusion to partial occlusion was present since there is no movement in the images when real object occlusion happens in both frames. Only occlusion or partial occlusion by trees was taken into consideration, as occlusion occurrences consisted mostly of partial occlusion by road poles that did not influence method effectivity in any way.

In the dataset used, partial occlusion and occlusion were not a problem in terms of detecting moving objects, as shown in Fig. 15 and Fig. 16. The detection rate was measured in two research cases. The first case was partial to full occlusion and the opposite and the second case was partial to partial occlusion These two cases pose different issues for the subtraction methods. The results are in Tab. I.

One of the risks of the background subtraction methods is false positives when static objects of interest are occluded or partially occluded in one of the frames. It is because as different parts of the object are being revealed they appear strongly in differential image. To minimize this risk narrower object detection specific to the application scenario might be implemented or tracking algorithms might be used.

B. Illumination Variation

Employment of the real image dataset underlined the real issues which occur in the IR pictures. There was possible to examine the proposed method against local illumination changes, in particular against vehicles passing through shadows.

In general, detection and pattern recognition may be heavily affected by global and local illumination changes especially that shadows alternate texture of the road and appearance of objects of interest [24].

Shadows are especially evident in cases where land transportation is a subject of the application because many objects such as trees, buildings and vehicles itself produce a shadow.

It was found that passing through shadow affects detection rate as shown in Tab. I. Moreover, the frame subtraction is a step of processing that is not robust to this kind of local illumination changes since the detection efficiency depends on the vehicle appearance. Example of the false negative due to passing through shadow is presented in Fig. 17.

Similarly, it has been found that autogain issues decrease the detection rate as well. The autogain changes are an issue when the difference between the object and its background is vague.

Both autogain and passing through shadow issues are caused by low local contrast level and result in low intensity subtracted image and low grayscale resolution.

TABLE I

DETECTION RATE AGAINST PASSING THROUGH SHADOWS AND OCCLUSION

Sensitivity Miss Rate Precision

Passing through shadows 67% 33% 100%

Partial occlusion to partial occlusion 93% 7% 96%

Partial occlusion to full occlusion or the opposite 96% 4%

Full/partial occlusion to no occlusion or the opposite 100% 0% 4%

The proposed method, as well as the background modeling methods, are not suitable for freely camera motion. The motion estimation with the affine transform model provides the object’s silhouette only, the trajectory is not estimated as for the trajectory classification methods [25]. However, the complexity of the method is not high and the task of motion estimation can be shifted to the drone’s navigation system by using UAV telemetry data.

VI. CONCLUSIONS

In this paper, a motion detection method is presented. The method detects movement of objects based on the intensity of regions in two consequent frames.

Camera motion is compensated by estimating and applying geometric transformation between frames. Estimated geometric transformation is affine transformation.

The critical step of the method is background subtraction which leaves the noise and possible objects of interest. A possible drawback is that the static objects occluded in the first frame and visible in the consequent frame would be falsely detected.

Because of a priori assumptions in the presented method, there is no further object classification technique implemented. However, because Canny’s Edge Detector is used, in the future work object could be further classified based on contour, or this step could be changed to filtering out the noise and then employing one of the classification techniques chosen basing on the specific scenario. However, the state-of-art classification techniques are based on neural networks approach.

It would be plausible to perform motion estimation in camera registration stage using UAV telemetry data. There is a number of advantages for such a solution. First of all, the fusion of data would be more robust in various scenarios since it would not rely on the choice of features extracted that are determined by environment structure. Secondly, data fusion would make the process much faster, since motion estimation depends heavily on the size of the input data and motion estimation quality depends on the type of the environment. Moreover, such solution would be more robust to the motion blur caused by the movement of the camera – the problem that was not detected during the first experiment since the camera was simulated but occurred in the second one in the real implementation of the method.

In case of thermal IR dataset used for further evaluation, saliency map was used for object detection, because of big variation in the appearance of heat-emitting objects as well as assumption that detection is not limited to any class of objects, e.g. cars, trucks, humans. This approach allowed to detect moving heat-emitting objects with better accuracy in the sequences of 320×256 [px] images. Such an approach can be effective in case of thermal IR camera because heat-emitting objects appear more prominently than objects in daylight camera images.

Although the background subtraction method is the most commonly used with a camera mounted on a stationary platform, it is a sufficient method for detecting moving vehicles, especially if the vehicles appear as easily distinguishable from the background regions.

Essay: Motion detection method in IR camera video

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: