Abstract
Evaluating the aftermath of a cyclone is an expensive, manual and time-consuming process which delays rehabilitation of affected victims. As multiple requests for grant aids pour in, it is necessary to cross-check the reality of the damage. At present, this investigation is mostly done by the concerned authorities manually. In recent times, social media has become an avid source of post disaster images. This pool of images may be used as a database to extract features to draw conclusions on the severity of cyclones. As Tropical cyclones are an annual event in the Indian subcontinent, our research can be phenomenal in curtailing the cost of post disaster management. The proposed CNN based architecture aims to detect whether an area has been damaged by a cyclone or not, and also the adversity of the same. We have deployed transfer learning to fine tune the models, VGG-16 and VGG-19 to perform the task. This damage assessment system categorizes the images depicting distinct features like trees, posts, damaged property which have been explicitly cross-validated with Grad-CAM further into the research. Our proposed model exhibits high accuracy in classifying post catastrophic destruction in a proficient and cost-effective manner.
Keywords—Transfer learning, VGG-16, VGG-19, CNN, Grad-CAM, Cyclone, Damage evaluation.
I. INTRODUCTION
In today’s world, crisis management and rehabilitation can be made seamless with the implementation of Artificial Intelligence and Machine Learning. Carrying out real life situations through computer simulations has become easier due to the rapid increase in the data that is available to us from disparate sources. Among the real-life situations, the significant ones being natural disasters. Natural disasters mostly consist of cyclones, wildfires, earthquakes, tsunamis.
The increased frequency of natural disasters in recent years may be ascribed to global warming [1]. Among the natural disasters occurring on the face of Earth, the catastrophic degree is high for cyclones [2]. Created by low depression areas on the oceans in Tropical regions, cyclones consist of high velocity winds and extreme rainfall [3]. It wreaks havoc to life and property alike [4].
The existing researches have focused on identification and assessment of aggregate level images using remote sensing, remotely piloted drones, or photographs collected by media houses and locals. This process requires extensive resources and depends on suitable weather conditions. The conventional techniques used presently restricts rapid evaluation of the destruction intensity which delays the relief aid grant process.
Image processing can be used to determine the severity of the destruction due to cyclone. This algorithm can be implemented on post destruction images taken from the affected areas and simplify efficient relief fund allocation in that particular area. This AI-powered solution is aimed at reducing the time, effort and complexity of post destruction recuperation.
With the advancement of social media websites such as Instagram, Facebook, Pinterest, and Twitter there is a massive pool of post calamity images available. These images can be crowdsourced to create a database, and curate a pivotal decision-making algorithm on the adversity of natural disasters.
The aim of this research paper is to devise a swift, accurate deep learning algorithm to evaluate the intensity of the aftermath of a cyclone, using a minutely created novel dataset of macro-images collected from manifold sources. As tropical coastlines suffer from an annual arrival of cyclones, this research can accelerate the process of grant aids and contribute to the betterment of rural areas.
II. LITERATURE SURVEY
Path-breaking works [5 – 9] on segregating social media images for post-disaster response use Convolutional Neural Network to determine the damage since the preprocessing of data set required in a CNN is less in contrast to other classification models. The following are the recent researches that have been conducted in the fields related to our study:
TABLE I
Year Author Source Summary
2020
[10] Huan Ning et al International Journal of
Geo-Information, MDPI
Journals Detection of flood images collected by web scraping using CNN architecture.
2019
[11] Muham
mad
Dawood et al
Neural Computing and
Applications, Springer
Determination of hurricane intensity using satellite images by deploying a deep CNN.
2019
[12] Chinmay Kar et al Springer Link research article Evaluating intensity of cyclone on Bay of Bengal using satellite images with multi-layer perceptron.
2021
[13] Chinmay Kar & Sreeparna Banerjee Elsevier, Computers and Geosciences Predicting intensity of Cyclones in North Indian Ocean with Multi-layer Multi-block Binary Pattern.
2016
[14] Md Al-Amin Hoque et al Taylor & Francis International Journal of Remote Sensing A case study of Bangladesh tropical cyclone using object-based image analysis.
III. DATASET
Prior to the implementation, a novel dataset has been created by collecting images from Google, Getty images [15], iStock photo [16] and various social media platforms like Twitter, Facebook, Pinterest etc. A total of 10000 images were collected.
After a detailed study, the images were segregated into 2 classes of damaged and non-damaged images. The class representing damaged images consists of images showing the damage caused by a cyclone and the non-damage class represents images of structures, trees, settlements and metropolitan that remained unaffected after the cyclone.
Images indicating damage caused by a cyclone are downloaded from various sources across the internet. An identical approach is taken for collecting images of unaffected regions. Each folder is then taken and put into a folder representing the training data for our experiment.
This data was trained using the VGG-16 model to classify damaged and undamaged images.
According to our segregation, we now have a folder consisting of images of damaged property and landscape affected by a cyclone. It has 5000 Images. This set of images was then split into two classes representing low damage and high damage. Each class consists of 2500 Images. These images were then collected and saved in the folders under the names of low damage and high damage respectively by the same process as followed while creating the previous dataset. This dataset was trained using the VGG-19 model to label the damage severity as low or high.
IV. CONVOLUTIONAL NEURAL NETWORKS
CNN has shown promising performance in the computer vision venture [17]. The popularity of CNN is attributed to its easy training nature and highly efficient performance. A basic CNN can be divided into two parts – 1. Convolution base and 2. Classifier
The convolution base consists of a stack of convolution and pooling layers aimed at generating features of an image. The convolution layer performs matrix convolution of image pixels with kernels of predefined size to produce a feature map. The values of the feature map are then passed through an activation function to preserve the non-linearity of the images during convolution. The feature map undergoes max pooling to ensure reduction of dimensions and hence improve computational efficiency.
The classifier layer consists of multiple fully connected dense layers and begins with a flatten layer that transforms the 2-D feature map into 1-D vector. The output layer with softmax activation function determines the output class. The basic idea of a fully connected layer is to propagate the activations of the previous layer successively to the subsequent layer [18].
V. TRANSFER LEARNING
The idea of transfer learning is to leverage the knowledge gathered while solving one problem in solving another similar problem or correlated task. Transfer learning helps us to build models efficiently with higher accuracy at lower computational effort.
Transfer learning can be applied in image processing because all models detect similar low-level features (edge, color, variation of intensity etc.) regardless of the dataset or cost function. Hence a model, which has been previously trained on a large dataset, may be fine-tuned according to the dataset in question instead of creating a custom CNN model from scratch.
In transfer learning, some layers from the pre-trained model are not available for further training. The weights of neurons on these layers are frozen and not updated when the models are modified.
The number of layers to be frozen depends on the number of features we want to derive from the previous model and the size-similarity matrix that determines the correlation of our dataset and the dataset used to train the model. The high-level layers are either removed from the pre-trained network or new layers are added and trained with the new dataset to create the transfer-model.
As the computational cost of training these models are high, the general trend is to import and use these as per our requirement from published literature [19 – 21].
The idea of transfer learning can be summarized through the flowchart below:
VI. MODELS
The annual competition of ImageNet [22] has produced highly popular architectures, like AlexNet, VGG-19, VGG-16, ResNet and Inception V3.
The dataset is composed of over 15 million images (with 229x229x3 dimension) with around 22000 categories, out of which 1.2 million are training images and the rest of the images are used for validation and testing.
A. VGG-16[23]
VGG-16 is a CNN based model which achieved 92.7% top-5 test accuracy in ImageNet. The model exhibits an improvement over AlexNet by substituting large filters (11 in the first convolutional layer and 5 in the second convolutional layer) with numerous successive 3×3 filters. It is preferred for its simplicity and low loss rate.
B. VGG-19
VGG19 is a 19-layer deep CNN based model trained on about a million images from ImageNet dataset and capable of classifying images into 1000 categories. It has exhibited robust accuracy in image classification.
C. MobileNetV2[24]
MobileNetV2 is a classification pre-trained model, developed by Google, which is 53 layers deep. The model consists of initial convolution layers having 32 filters followed by 19 residual bottleneck layers. For detection, MobileNetV2 is about 35% faster than MobileNetV1 with equivalent accuracy.
D. Inception V3[25]
Inception V3 is a widely-used model for image classification and achieved above 78% accuracy on ImageNet dataset. This pretrained model is 48 layers deep and is capable of classifying images into 1000 categories. The input to the model is of the size 299×299. The architecture of InceptionV3 is comparable to InceptionV2 as they include two 3×3 convolutions instead of one 5×5 convolution resulting in an improvement in the computational performance.
VII. MOTIVATION FOR CHOOSING VGG MODELS
The models VGG-16, VGG-19, MobileNetV2 and Inception V3 have been separately trained with our carefully curated dataset to classify damaged and undamaged test data. A comparative study on the training and validation accuracy of the above-mentioned models helped us to conclude that VGG-16 would be the suitable choice. This model does not show problems of overfitting or underfitting and has a very high accuracy.
A similar comparative study was conducted for selecting the model for classifying the severity of damage as high or low and we concluded on choosing VGG-19 for it.
VIII. ARCHITECTURE USED
A. VGG-16
We have implemented the VGG-16 architecture in our model to categorize the image as damaged or non-damaged. Since VGG-16 is a pre-trained model which is available to us, it needs to be fine-tuned to our dataset for classification.
The input image (224x224x3) is put through multiple convolutional layers which are activated with a non-linear ReLU function. It consists of 13 convolution layers and 3 dense layers adding up to 16 training layers. There are 6 Max-pooling layers in the model to reduce dimension of the feature map and hence the computational cost.
The convolutional layer is used to extract features from an image. Each layer tries to extract features at a deeper level than the previous layer. Through max pooling, the input size of the successive convolutional layers gets reduced by a factor of 2 (224->112->56->28->14->7). After every max-pooling layer the number of the filters used, which is represented by the convolutional layer width, increases by a factor of 2 (64->128->256->512). Post flattening, the (7,7,512) feature map is compressed into a 1-D vector of (1X25088) dimension.
2021-6-14-1623674999