Transformation of predictor space to embedding space
In the proposed approaches, distance metric calculation plays an important role. Existing classification methods used the priori technique metric in the predictor space. To calculate the distance metric in the proposed approaches, embedding space is considered and represents intrinsic non linearity in combined data set LANDSAT and ENVISAT images. Figure 5.1 shows the data transformation from predictor space to the embedding space.
A variety of logical distances is regularly tested, because the selection of the distance affects the custom of the classification algorithm. Common examples include L1 norm (or Manhattan distance), the L2 norm (or Euclidean distance), L∞ norm (or Chebychev distance) and Mahalanobis distance. Split-sample comparisons may help during the selection of an appropriate distance, but there is no available systematic procedure suitable for finding an appropriate distance metric. Moreover, all these norms are sensitive to scaling in the embedding space.
Distance metric is searched in a lower dimensional space or embedding space which is suitable for separating a given class. Nonlinear transformation (or mapping) minimizes the variability of a given class membership of all those pairs of points whose cumulative distance is less than a predefined value (D). It is worth noting that this condition does not imply finding classes in the predictor space that minimize the total intra class variance, which is the usual basis in class analysis . A mapping fulfilling this condition isschematically shown in Figure 3. A lower dimensional embedding space is chosen. Built-in multiple dimension of the combined data is considerably lesser than that of the inputs. A simple way to prove this assumption is calculating the dimensionality of the covariance matrix of the standardized predictors and to count the number of dominant eigenvectors.
In the proposed approaches, distance metric calculation plays an important role. Existing classification methods used the priori technique metric in the predictor space. To calculate the distance metric in the proposed approaches, embedding space is considered and represents intrinsic non linearity in combined data set LANDSAT and ENVISAT images. Figure 5.2 shows the data transformation from predictor space to the embedding space.
A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels. Pixel values typically represent gray levels, colours, heights, opacities etc.
An image can be represented by f(x,y). x,y: Spatial coordinatef: the amplitude of any pair of coordinate x,y, which is called the intensity or gray level of the image at that point.x,y and f, are all finite and discrete quantities.
Convert the function f(x,y) into a digital image by sampling and quantization.Visually analyse an image in either the spatial domain or the frequency domain. The spatial domain is the normal image space in which the change in the position in images I directly projects to a change in the position in scene S. In other terms, the section of the real plane traversed by the coordinates of an image is called spatial domain where ‘x’ and ‘y’ are referred as spatial variables or spatial coordinates. The frequency domain refers to the space where each value of the image at the image position F represents the amount that the intensity values in image I vary over a specific distance related to F. All the changes in the image position in frequency domain correspond to the changes in spatial frequency. Spatial frequency is the rate at which image intensity values are changing in the spatial domain image I.
There are three ways in which a digital image can be represented. These are:
Figure 5.1. (a) Image plotted as a surface (b) Image displayed as a visual intensity array (c) Image shown as a 2-D numerical array
Figure 5.2 .Transformation of information from predictor space to embedding space
A variety of logical distances is regularly tested, because the selection of the distance affects the custom of the classification algorithm. Common examples include L1 norm (or Manhattan distance), the L2 norm (or Euclidean distance), L∞ norm (or Chebychev distance) and Mahalanobis distance. Split-sample comparisons may help during the selection of an appropriate distance, but there is no available systematic procedure suitable for finding an appropriate distance metric. Moreover, all these norms are sensitive to scaling in the embedding space.
Distance metric is searched in a lower dimensional space or embedding space which is suitable for separating a given class. Figure 5.2 shows Simulated annealing algorithm which is used to convert the predictor space to embedding space. Nonlinear transformation (or mapping) minimizes the variability of a given class membership of all those pairs of points whose cumulative distance is less than a predefined value (D). It is worth noting that this condition does not imply finding classes in the predictor space that minimize the total intra class variance, which is the usual basis in class analysis . A mapping fulfilling this condition isschematically shown in Fig. 5.1. In this example, a hypothetical classification problem consisting of two classes in two dimensions (for example, {x1, x2}), one of which almost completely surrounds the other was used. A lower dimensional embedding space is chosen. Built-in multiple dimension of the combined data is considerably lesser than that of the inputs. A simple way to prove this assumption is calculating the dimensionality of the covariance matrix of the standardized predictors and to count the number of dominant eigenvectors.
A basic issue to be calculated during the source of an embedding space is the effects of the intrinsic nonlinearities present in combined data sets. It is reported in the literature that the various sources of non-linearity affect severely the land cover classification products. This implies that dissimilar class labels depend differently upon the input vectors. Consequently, it is better to find class-specific embeddings and their respective metrics rather than a global embedding.
A basic issue to be calculated during the source of an embedding space is the effects of the intrinsic nonlinearities present in combined data sets. It is reported in the literature that the various sources of non-linearity affect severely the land cover classification products. This implies that dissimilar class labels depend differently upon the input vectors. Consequently, it is better to find class-specific embeddings and their respective metrics rather than a global embedding.
5.1 TECHNIQUES TO CONVERT PREDICTOR SPACE INTO EMBEDDED SPACE
The products derived from sensed imagery have become a cost effective source of high resolution spatiotemporal information. This information is currently required by many environmental and climatic models. These are also required to carry out the diverse planning and monitoring activities.
A very common task in remote sensing applications is image classification, whose most common products include the land cover maps, assessment of deforestation and burned forest areas, crop acreage and production estimation, and pollution monitoring. Image classification is also applied in optical pattern and object recognition. Hence a number of classification algorithms have emerged in the past years to cope with both the increasing demand for these products and the specific characteristics of a variety of scientific and industrial problems.
Some common examples of classification algorithms that are based on statistical and computational intelligence frameworks are as follows: Gaussian maximum-likelihood classifier, fuzzy rule based techniques, fuzzy decision trees, Bayesian and artificial neural networks, support vector machines and the k-nearest neighbour(k-NN) algorithm.
A supervised classification algorithm consists of the two phases as given below:
1) Learning Phase: the algorithm identifies a classification scheme based on spectral signatures obtained from training sites having known class labels.
2) Prediction Phase: the classification scheme is applied to other locations with unknown class membership.
The difference between all these algorithms is the procedure through which the relationships between the predictor space and the class label are found. The aim is to find a maximised discriminant function or a minimized cost function. The optimal parameter set which closes the distance between the observed attributes and the classification response is found.
5.2. SIMULATED ANNEALING
Simulated annealing is a method for solving unconstrained and bound-constrained optimization problems. In the situations when we need to maximize or minimize the result, we use the method of simulated annealing. The method involves the physical process of heating a material and then slowly lowering the temperature to decrease defects, which results in minimizing the system energy.
Figure 5.3: Simulated annealing Algorithm
Figure:5.3 Simulated Annealing method
Simulated annealing which is a statistical mechanics method, used as a tool in solving complex optimization problems, can be used to solve problems arising in image processing. It is used in the estimation of parameters necessary to describe a geometrical pattern corrupted by noise, the smoothening of bi-level images and the process of halftoning a continuous-level image.
Consider the problem of minimizing the function E(xi) of the many variables xi, i.e., of looking for the values of xi that yield the absolute minimum of the function E(xi). The basic idea of simulating annealing consists of treating the system to be optimized as a physical system described by the degrees of freedom xi, with the energy given by E = E(xi). One then looks for the state of minimum energy of the physical system, i.e., what physicists call the ground state.
With simulated annealing, the ground state is reached by simulating a slow cooling of the physical system, starting from a very high temperature T down to T=0 . The cooling must be slow enough that the system does not get stuck into thermodynamically metastable states that are local minima of E(xi). This slow cooling process (called annealing from the analogy with metallurgic processes) is simulated using a standard method proposed by (Metropolis et al 1953).
For a given temperature, the Metropolis method is a way to sample states of the physical system with the Boltzmann distribution
f= e-E/T (1)
which is the distribution that properly describes the state of thermodynamical equilibrium for a given temperature T. One starts with a random configuration xi. One then chooses (again, randomly) a small perturbation ∆xi in the system and calculates the energy change ∆E caused by the perturbation
∆E= E(xi+∆xi )– E(xi ) (2)
If ∆E < 0, then the perturbation is “accepted,” for it means that it is energetically favorable for the system; otherwise, it is accepted with probability e-∆E/T.
When the perturbation is accepted, one continues the process with the perturbed state xi+∆xi replacing the old one; otherwise a new perturbation ∆xi is attempted. It can be shown that the sequence of states obtained in this way is distributed according to (Kirkpatrick et al 1983). The Metropolis method is widely used in physics to study numerically the thermodynamical properties of large systems that cannot be treated with analytical methods.
In simulated annealing, one starts with a high value of T, so that the probability of the system being in a given state is independent of the energy of that state. One then slowly reduces T. by making sure that at each new value of T enough steps of the Metropolis procedure are made to guarantee that thermodynamical equilibrium has been reached. One continues the procedure until T = 0. If the cooling has been slow enough, the final state reached is the ground state of the physical system being considered; i.e., the values of x, so obtained realize the absolute minimum of the function E. In practice, in many cases one is not really interested in finding the absolute minimum. Rather, in many interesting situations the minimum configuration is highly degenerate. In other words, there are many minima with values of E very close to its absolute minimum value, and one looks for one of the very many of them.
For problems of constrained minimum, the method can still be applied. One simply has to make sure that all the perturbations ∆xi that are generated during the Metropolis procedure continue to satisfy the constraints of the problem. In particular, the constraints could consist of prescribing discrete values for the xi. Thus, simulated annealing applies as well in problems of discrete optimization (Kirkpatrick et al 1983).
A new point is generated in each of iteration of the simulated annealing. The distance of the generated new point from the current point is based on the probability distribution that is proportional to the temperature. All the points that lower the objective are accepted by the algorithm and some points that raise the objective are also accepted by the algorithm with a certain probability. Hence the algorithm saves itself from being trapped in local minima and is able to explore globally for more solutions that are possible. An annealing schedule is selected to decrease the temperature systematically as the algorithm proceeds. With the decrease in the temperature, the algorithm reduces the extent of its search to converge to a minimum (Kirkpatrick et al 1983).
Fig.4.3 shows pseudo code of simulated annealing. It begins from a state s0 and carry on to either a maximum of kmax steps or till the state of energy (emin) or less to start. In the implementation, the neighbour’s must produce an arbitrarily selected neighbour of a known state s; the call random (0, 1) have to choose and revisit a value in the range [0, 1]at random. The annealing plan is distinct by the call temperature(r), it will give way the temperature to utilize, specified the fraction r of the given time which has been finished so far. Temperature T and state S are mapped to pixel intensity and the spatial coordinates.
References
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. science, 220(4598), 671-680.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6), 1087-1092.