TITLE: An overview of methods being used to detect concealed objects and weapons using X-ray images and machine learning.
1 Introduction
In the current climate of international politics and the growing fear of imminent terror attacks against major countries, governments and border patrol agencies are trying to tighten the security of their borders. In the study, “Border Crossings and Terrorist Attacks in the United States”, it was found that 87% of people previously indicted of terrorism charges were able to cross the border into the US, generally through an airport rather than land/sea ports. (Smarick & LaFree, 2012) The security checkpoints in airports mainly use x-ray imaging and body scanners, where the detection of an illegal weapon or explosive could prevent any potential danger to the public. However, in the current systems that are implemented, the detection of these illegal objects depend heavily on the person looking at the image on a screen and their perception of what looks like a potential threat (Hardmeier et al. 2005), rather than taking advantage of latest developments in machine learning and computer vision technology.
This brings to light a major flaw, the dependence on human cognition and prior knowledge. The main issue with this is that the most crucial role within the system is played by one that is most vulnerable to variation. Bearing in mind how much of the screeners judgement depends on their prior knowledge, training and experience (Hardmeier et al. 2005), as well as the three image-based factors determined by studies (Hardmeier et al. 2005; Schwaninger et al. 2004). These are view-points, bag complexity and superposition. These image-based factors are also used in the object recognition test (ORT), a test to determine how well screeners can assess a given scenario (Schwaninger et al. 2004). Along with this, the effects of fatigue, stress and other human emotional/physical factors which are not considered in the research, could potentially impact a screeners judgement. With all of these factors put into consideration, it could be argued that the accuracy of detection systems with great dependence on human observation can be problematic.
This leads into the main discussion point of this paper, can ML be a permanent solution to this problem, if not, can it be implemented to further increase the accuracy of existing systems? The difficulty added by the format of the images being analysed makes human classification even harder. X-ray images generally have large variation depending on viewpoint, many overlapping items (resulting in limited depth cues) and the variety in items that can be scanned. (Zhang et al. 2014) Consequently, making it harder to figure out the positioning of certain items as well as what the object actually is. Increasing the number of views and images to improve the view and depth perception would inevitably mean increasing the strain on the workforce and creating a lengthier process, whilst also adding more dimensionality to the problem. This led to researchers focusing on utilising computer vision and ML technology to carry out these tasks. Whilst ML has been used in analysing x-ray images in the field of medicine, outside of that there has been limited investigation into utilising its capabilities other than feature extraction, image segmentation and other pre-processing elements to identify specific features of images, rather than actually identify the objects within them.(Zhang et al. 2014)
This paper aims to analyse and evaluate the methods currently in use, and compare them to ML methods being researched to see whether they can provide a sufficient substitute. The technology being used will not be analysed in depth, as the details of the methods are out of scope for this paper, but the relevant information can be found within the bibliography and referneces provided. In addition to this, a review of the potential for an automated system that does not rely on a person to detect the objects. The research papers that were looked at for this paper are ones that are based on using machine learning (ML), artificial intelligence (AI) and computer/machine vision(CV). Section 2 will outline the workflow of a ML program to provide an idea of the structure of things. Section 3 will be based around the current methods being used in object detection. Section 4 will discuss the current efforts of a ML solution to the existing issue, whereas sections 5 and 6 will further analyse the ML techniques.
2 Reviewing Current methods and Machine Learning
2.1 Current methods
In the current methods being used to detect threats at security checkpoints, there are two that are widely used to assist screeners with their decisions to pass a bag as OK or not OK. These are pseudo-colouring and segmentation. Both of which are CV techniques that can be implemented within a ML application. In their paper, Abidi et al. (2006) present five main categories of pseudo-colouring techniques based on their literature review. These techniques sum up the different kinds of pseudo-colouring that is used for x-ray imaging, all of which have different applications. The paper goes on to present the recommendations for optimum colour assignment built on the human visual system and their own implementation of a system based on their findings. The study concluded with the authors saying that pseudo-colouring techniques can provide additional enhancement, better data visualisation and increased screeners alertness. Along with this, they state that the methods presented are a major improvement from the grayscale images that were being used prior to this.
This method of pseudo-colouring is in wide use today, with objects given colours based on their physical density as well atomic make up, it can assist with items that are dense in nature, like metal. Which is where segmentation comes in. Segmentation is a method of image processing where the object of interest is separated from its background and can be analysed further with the use of other image processing techniques. It is also a vital part of ML projects that rely on CV. In this case, segmentation can be used with pseudo-colouring to help the judgement of screeners with great effect.
However, this is susceptible to the presence of many non-threat items that travellers may include within their luggage. Given that the screeners have only a few seconds to decide whether an item may or may not be a threat, they would need to be well trained as to what could be a potential threat. Studies done by A. Schwaninger (2004, 2005) confirm that the representation of everyday objects in an X-ray image can be a lot different to their actual physical make up, which can require training the screeners to see such elements. As mentioned in the introduction, within an x-ray image a threat can be masked by other objects, the orientation of the object could put the screener at a disadvantage or it could simply be disguised as a normal, every day object.
The way in which the system actually operates is as follows; A bag is passed through an X-ray scanner, an x-ray image of the bag is presented on the work station, the image goes through the pseudo-colouring and segmentation methods (this step can depend on the system), there is a visual indication if an object is not being penetrated by x-rays (also an indication of the density), the screener decides whether to further search the bag or approve it.
Another study done by Schwaninger et al. (2006) confirms that the detection of threats comes down to two elements; image and knowledge based factors. In the study, they use two X-ray screening tests and a computer based adaptive training system. One of the screening tests included only images of guns and knives (XORT – image based) and the other (PIT – knowledge based) had many prohibited items included. The results conclude to say that the knowledge based test had generally increased the performance of screeners and increased their knowledge of what prohibited items look like within a bag. Along with this, there is an indication that the image based factors are related more to the visual-cognitive abilities of an individual and can not be trained as well as knowledge based factors.
With this knowledge, one could potentially go on to significantly improve existing systems. Since there are so many limitations involved with the process, from the reliance on human cognition and judgement, to the limitations of the portrayal of objects within an X-ray image, there is room for improvement. Improvement which, theoretically could be solved with ML application, which will be discussed further in section 4.
2.2 The machine learning pipeline
The type of machine learning considered for this review is supervised learning, which includes classification and regression problems. Problems such as X-ray image analysis can be placed under the classification umbrella, as the main aim would be to classify objects of threat. It can be said that each machine learning project has their own unique methodology for achieving the results they want; this can vary depending on the complexity of the project to what kind of data is being used. It can also be said that these methodologies all have very similar similar stages and can be broken down into the following:
FIGURE 1
This section will explain the stages shown in figure 1 to give a better idea of what goes on at each step. Understanding the flow of data can be important when analysing the accuracy of a given ML algorithm, as well as providing an insight into what areas can increase accuracy, if tweaked. The level of depth provided in this section is kept to a minimum as the elements within it are very broad areas with many different variations on the content discussed.
2.2.1 Data Gathering and Pre-Processing
The importance of using the correct data to train a machine learning algorithm can not be emphasized more. The way in which a ML algorithm will analyse and interpret the data presented to it will be based on the original data it was trained on. Thus, the data needs to be sufficient for the algorithm to determine links and patterns as well as have the right format. This relieves some strain from the algorithm and ensures that the data is tidy.
Pre-processing (PP) is the process of tidying the data up. In most machine learning projects, the most important aspect can be the data being used. In many occasions this can be unformatted, corrupt and inconsistent in nature. PP can involve a large variety of calculations and methods to ensure tidiness. This will depend on the ML project, but within a CV project, it will generally consist of image transformations, image cropping, segmentation (separating interest points from background), cropping images to keep a consistent and smaller data set. These are but a few of the techniques that are used in the literature reviewed in this paper, refer to section 2.3 to see how researchers are applying PP techniques to utilise X-ray images more efficiently.
2.2.2 Feature Selection and Extraction
Feature selection is the process of selecting specific features from a given data set for the ML algorithm to process. The selected features can significantly increase the accuracy of the ML program just as easy as being the source for error. However, if too many irrelevant features are selected for a given ML problem, then there is a risk of overfitting. Overfitting is when a ML algorithm takes a given set of features and creates irrelevant and inaccurate connections between them, thus, making the outcome invalid. In other words, the algorithm becomes too optimistic about the data being handed to it, and it creates connections between them that aren’t necessarily true. (Babyak 2004)
This stage is also where the developer can decide which features they would want to use for the calculations going to be done by the algorithm. For example, if the ML project was to classify leaves based on their geometric shape and vein features, the features that they choose could be a range of properties under those given categories. i.e. thickness and pattern of veins, roundness of leaf, serrated edges, surface area of the leaf, texture of the leaf surface. Again, the selection of specific features can vary, but due to the cyclical nature of machine learning projects, the effects can easily be reversed and improved by a new iteration.
2.2.3 ML Algorithm Selection
Algorithm selection can change depending on the project at hand, the bias/expertise of the developer and can also be made specifically for the intended use. The variety in selection can sometimes be overwhelming, as the ML industry has been producing algorithms for many decades now. Choosing the right algorithm can be difficult due to there being so many factors involved in the process. The decision can change depending on the subject area, data sets being used, type of data within the data-set and the intentions of the project as a whole.
Ultimately, the model you will choose does depend on the amount of data that is present. Some classifiers can not operate efficiently enough when the data set being used has not got enough data.
2.2.4 Evaluating the results
The method of evaluation that is chosen to display results can depend on the developer/analyst as well as the project. There are several ways to display the accuracy of a ML algorithm, its very much a subjective process, where the evaluation should fit in with the results that the project was looking for. For example, if comparing the accuracy of two algorithms, you could maybe use a confusion matrix along with the percentage of accuracy. This part of any ML project, much like other areas of science, is trying to prove that it was successful, or justify why it failed to achieve the task at hand.
Evaluating whether the model performed as predicted or figuring out what region it was lacking in can help provide the next step of action, if any is to be taken. Evaluation techniques can be used so that the underlying problems can be identified and dealt with, which is why the cyclical nature of ML projects can be so useful to the developer. Accuracy is obviously the most important factor in this case, but depending on the type of project at hand, can be undermined by the ability of the ML algorithm to predict on unseen data. Bearing in mind that the type of project can change the evaluation techniques used.
2.3 Proposed methods – ML
This section will discuss ML methods that are currently being implemented in aid of improving the detection of objects, material identification and object detection. There are a range of techniques used in each paper, more details on these can be found in the references that are related to the paper being discussed.
The way in which ML would fit in to the process of object detection would be before the image is sent to the workstation of the screeners. Where a classifier could potentially detect an object beforehand and provide either a percentage value saying how similar the detected item is to an object of threat, or a better indication as to what the item could be. The papers that are mentioned in this section are aiming for different things, but their main direction of research is in the automatic classification of threats. However, the automation process can also be the assistance that screeners require when they are making a decision. Especially if it can improve the image based factors of detection.
Roomi and Rajashankari (2012) propose the use of x-ray images and the Fuzzy K-Nearest Neighbour ML algorithm to detect concealed weapons. They use pre-processing techniques to convert the images into a binary image and segment objects from the background. The objects are classified as object or non-object. For the feature extraction process, they utilise a shape context descriptor in order to describe the shape and measure the similarities between different items. They also utilise Zernike moments, a method capable of measuring contours and shape regions. The paper provides little details as to the nature of their evaluation, but concludes to say that it performs ‘satisfactorily’ and does not provide any evaluation of accuracy or results.
A relatively recent study (Zhang et al. 2014) on joint and texture based x-ray cargo image classification found that the process is made easier by the use of ML and CV. The densely packed cargo containers can usually contain threats that go undetected in the traditional method of identification. This due to the high density images that are produced by items that are identical and stacked over each other within the container. They use a novel shape descriptor along with feature extraction algorithms to remove images and features that interfere with the process, such as the removal of non-classical cargo samples and the extraction of the container from the whole image in preparation for the classifier.
The images are classified based on 22 categories established by the World Customs Organisation. They opted to use a Support Vector Machine (SVM) classifier. They conclude to say that the accuracy could not satisfy the need and 3 steps were identified for future work; addition of structure information, splitting the high dimensionality space and applying the classifier to each subspace, and combining information such as multi-views and data from a dual-energy scan to add to the features.
Another study done for the department of homeland security (Nercessian et al. 2008) focuses on developing strong segmentation and edge-based feature detection. Where an image is enhanced before being dissected into the objects within it, allowing for easier identification. The paper focuses on handguns, but the concept can be applied to almost all items, given enough time. They also use SVM as a classifier and use modified edge-based feature extraction. Once the image features are extracted they are fused together before being analysed by the classifier, this provides them with “a more robust feature vector, doubling the amount of pertinent informationâ€. In the conclusion, they emphasise further that the system is able to perform as a real-time system due to the segmentation process, as well as reducing the number of false positives. Future research into the effects of superposition, viewpoint and bag complexity are also mentioned.
Whilst a limited number of texts were included in this review, further reading of the that were referenced by the authors can provide a deeper insight. The reviews in this section were just three examples out of many more, they were chosen as they touch upon different areas of where x-ray imaging is used from a security point of view, whilst also using different methods to achieve an equally complex goal. The papers are evidence of some of the work being done, where machine learning is the main method of classification.
2.4 X-ray Images Vs Visible Light Images
X-ray images depend on the emission, penetration and scattering of X-rays. They are a form of light on the shorter end of the electromagnetic spectrum, outside of the visible light range for human vision. There are multiple ways in which an image is formed using x-rays, depending on the use. Originally in medicine, an x-ray image utilised a form photographic film which has a radiation sensitive emulsion layer consisting of silver halide. This layer absorbs the X-rays emitted by a source (generally a point source) and an image is formed depending on the level of penetration by the target object. Areas which are penetrated in the process can be seen as the areas which have little to no darkness on the film. (Evans n.d.)
In the context of security imaging, the concept is the same, except the detector is a linear x-ray detector array instead of photographic film. The image is then fed into a monitor where the screeners can view it in real time. This process is safer than taking an x-ray image of your body, as no one is exposed to the x-rays directly. This also allows for the almost vital process of image processing to be applied to an image before being shown on the monitor. By having the framework in place for this, the resulting images can provide a lot more detail by using techniques such as noise-reduction, increased visibility of detail and contrast adjustment (Look Up Tables or windowing). By making these adjustments,
3 Effectiveness of the proposed machine learning methods
The techniques mentioned in section 2.3 utilise methods that are already applied, to an extent, in the current systems. The method of pseudo-colouring and segmentation can apply to almost any CV application. The way in which this is utilised is where the problem lies. A ML algorithm can utilise information from an image in ways human observers can not. They have the capability to analyse texture and contour data with a precision that can not be matched by human cognition.
4 Conclusion
There are still limitations as to what can be achieved by machine learning alone, as with the current implemented systems. However, using the two together could be the best solution to the problem at hand. Utilising the potential of ML analysis and image processing could help with improving current security measures. By improving the ability of ML programs to display better results when working with multiple views and the ability to analyse complex bags could improve the results being achieved.
The