TAXONOMY OF POLYMER SAMPLES USING MACHINE LEARNING
ALGORITHMS
K. Swathi[1]
K. Sugamya[2]
Asst.Professor, IT Dept, CBIT, Hyderabad. Assoc.Professor, IT Dept, CBIT, Hyderabad
Abstract
In the present world most of the objects are unit processed and most of them also created online.
So the rapid growth in technology has led to the decrease in manual work and is creating most of
the objects in various industries of machine-driven. One such automation requirement is found in
the chemical industry where machine driven package is required for the classification of various
kinds of plastics supported their absorbance values. One of the efficient algorithms used for
classification is through support vector machines which provides a classification model that is
trained and tested.
A solution to modify the sorting of various kinds of plastic by using the Fisher iris data set(which
is a result of Near Infrared Spectroscopy (NIRS)). Plastics are everyday used non-biodegradable
materials once not disposed properly have adverse effects on the atmosphere. For recycling of
plastics totally different sorts of plastics (polymers) need to be known and separate. For
economic reasons plastics must be known and sorted instantly. The Fisher Iris data set that can
be employed by us is a result of NIRS. The NIRS technique has been used for the instantaneous
identification of plastics. Measurements made by NIRS are quite accurate and fast. The
necessary algorithm needed to process the NIRS data and to obtain information on the polymer
category is written on the general purpose, high-level programming language Python as well as
on MATLAB. In order to extend the efficiency of this process we also implement KS algorithm.
Keywords: Machine Learning, SVM, KS algorithm
1. INTRODUCTION
Plastics are omnipresent and contaminating
the atmosphere. Disposal of plastics has
become a technological and social subject
that has created and attracted a lot of
attention from researchers, business people,
politicians, environmental activists and the
general public. One way to cut back the
environmental pollution, owing to plastic
waste comprising disposables and durables,
is to recycle them. That is, to recover the
used plastics from municipal or industrial
wastes streams and convert them into new
useful objects. Recycling of plastic-wastes is
steady gaining importance as result of the
efforts on conservation of oil resources and
therefore the shortage of disposal sites. The
plastic waste is separated into different
material sorts by manual sorting, to obtain
ecological materials of high worth. The
purity of sorting fractions obtained in this
way is not spare for direct utilization of pure
polymers. Moreover, it is uneconomical and
therefore the working conditions don’t seem
to be solely only unpleasant however even
dangerous to health. Considering the above
difficulties, an automatic plastic sorting
approach, involving automatic identification
of materials followed by a mechanical
sorting, appears as associate enticing
different to manual sorting. NIR
spectroscopy helps in identification of
individual plastic sorts and offers a
promising approach to waste sorting. Online
review technology ideally makes use of NIR
spectrometry, which is capable of in
operating quickly. Spectroscopic ways are
non-invasive and non disparaging measuring
systems.
Plastic Waste Sorting System for Recycling
of PET Materials ways offer spectral
information from that meaning data will be
quickly extracted for analysis. The detection
speed should however, match the specific
situation created by the sort and size of
arriving parts. The identification is very
correct in terms of its ability to distinguish
different plastics, and is reasonably precise.
Present system of sorting is done either fully
manually or mistreatment specialised
machinery that is very expensive. It also
needs lots of human resource for
maintaining this method of sorting plastics.
This manual process is not thus effective or
efficient. It is highly prone to error. The
cross verification of this manual process is
once more extremely troublesome. So
abundant of time is consumed in during this
entire process. Hence associate machinedriven
package which might kind the
plastics supported the values obtained from
NIRS is to be developed.
1.1. OBJECTIVE
With this paper, we assist in the
development of a plastic sorting technology
using NIR (Near Infrared) Spectroscopy.
Different sorts of plastics wiz, PPT, PVC
etc. show different behavior once subjected
to Near Infrared rays. This difference in
behavior will be analysed to kind numerous
kinds of plastics in pace and with negligible
error. The technology has high value in the
plastic utilization business along side several
different similar industries.
In this paper, we are using algorithms such
as Differential analysis and SVM to check
regarding numerous sorting techniques in
Machine Learning. Then we take the data
set which contains the data of plastics
behavior under NIR rays and develops a
pattern behavior for each type. This will be
done using SVM algorithm in MATLAB
software.
For the second part, we convert this SVM
algorithm developed using MATLAB to allpurpose,
high-level programming language
Python. The main idea of this project is to
develop and to understand however on
sorting of varied plastic sorts and then
facilitate transfer this method for the
utilization of industry/business.
1.2 PROBLEM DEFINITION
The plastic waste typically includes six sorts
of materials particularly, polyethylene (PE),
poly-ethylene teraphthalate (PET), polypropylene
(PP), poly-vinyl-chloride (PVC),
high density polyethylene (HDPE) and
polystyrene (PS). The experimental lab
model classifies them and kinds PET alone.
Through the existing set-up, PET materials
can be sorted close to 100% with up to 200
kg per hour outturn. The maximum outturn
is proscribed by the speed of the
spectrograph utilized in the system. Higher
throughputs up to 1 tonne/hr will be
achieved by using high-speed spectrographs
and quicker sorting routine.
So in order to extend the potency of sorting
an automatic organisation is meant to be
developed for the polymer samples. The
proposed methodology ought to be in a
position to take the associates analysis of
NIRS graphs as input and will provide an
output that classifies the polymers supported
their absorbance values and different
characteristics.
2. METHODOLOGY
The main plan of this paper was to develop a
knowhow on sorting of
various plastic varieties then facilitate
transfer this methodology for the
employment of trade. So the
NIRS spectroscopy analysis is used to
urge the dataset containing
the polymer samples.
Before, we have a tendency to discuss the
method of classification we tend to would
like to offer a speedy report on NIRS
spectroscopic analysis.
Near-infrared spectroscopy (NIRS) might be
a qualitative technique that uses the nearinfrared
region of
the spectrum (from regarding 800 nm to 2500
nm). Typical
applications embrace pharmaceutical,
medical diagnostics (including blood
glucose and pulse oximetry), food and
agrochemical quality control, and combustion
analysis, similarly as analysis in helpful neur
oimaging, medical specialty & science, elite
sports coaching, ergonomics, rehabilitation,
neonatal analysis,
brain computer interface, medicine (bladder
contraction), and neurology (neurovascular
coupling).
Plastic resins live composed of a spread of
compound varieties. Similarities within
the size and form of the
resins build them difficult to differentiate by
sight alone. During this application
note, close to infrared (NIR) spectroscopic
analysis is used to sort coloured resins
composed of assorted polymers. Diffuse
reflection measurements square
measure created within the NIR region to
capture distinct
spectral variations ensuing from the
distinctive compound compositions, whereas
avoiding the detection of
spectral variations arising from resin color.
During this application note, the utilization of
NIR spectroscopy for distinctive coloured pla
stic resins is represented.
Figure 2.1: NIRS Spectroscopy
2.1 SVM
In machine learning, support vector
machines (SVMs, also support vector
networks) square measure supervised
learning models with associated learning
algorithms that analyze information and
acknowledge patterns, used for classification
and regression analysis. Given a collection
of training examples, each marked as fit in
to one of two categories, an SVM training
algorithm builds a model that assigns new
examples into one category or the opposite,
creating it a non-probabilistic binary linear
classifier. An SVM model might be
illustration of the examples as points in
space, mapped so that the samples of the
separate categories square measure divided
by a transparent gap that’s as wide as
potential. New examples square measure
then mapped into the very same space and
foreseen to belong to a class supported that
aspect of the gap they fall on.
2.2 DISCRIMINANT ANALYSIS
TECHNIQUES
There are two types of these techniques
which are described as follows:
2.2.1 LINEAR DISCRIMINANT
ANALYSIS
Linear discriminant analysis (LDA) could be
a technique used in statistics, pattern
recognition and machine learning to seek out
a linear combination of features/options that
characterizes or separates two or a lot
classes of objects or events. The ensuing
combination is additionally used as a linear
classifier or, a lot of normally, for
dimensionality reduction before later
classification.
2.2.2 QUADRATIC DISCRIMINANT
ANALYSIS (QDA)
Quadratic discriminant analysis (QDA) is
closely associated with linear discriminant
analysis (LDA), wherever it's assumed that
the measurements from every category
square measure commonly distributed.
Unlike LDA however, in QDA there’s no
assumption that the variance of every of the
classes is identical.
EXAMPLE OF FISHER IRIS
The Iris flower data set or Fisher's Iris data
set is a variable data set introduced by Sir
Ronald Fisher (1936) as an example of
discriminant analysis. It’s generally known
as Anderson's Iris data set because Edgar
Anderson collected the data to quantify the
morphologic variation of Iris flowers of
three related species. Two of the three
species were collected within the Gaspé
Peninsula "all from an equivalent pasture,
and picked on the equivalent day and
measured at the equivalent time by the
equivalent person with the equivalent
apparatus". The data set consists of fifty
samples from each of three species of Iris
(Iris setosa, Iris virginica and Iris
versicolor). Four features were measured
from every sample: the length and also the
width of the sepals and petals, in
centimeters. Supported the mix of those four
features, Fisher developed a linear
discriminant model to tell apart the species
from one another.
Figure 2.2: Fisher Iris data set
Figure 2.3: Linear discrimination of Fisher
Iris data set
Figure 2.4: Quadratic discrimination of
Fisher Iris data set
2.3 TYPES OF CLASSIFIERS
Now we shall discuss about the various
types of classifiers on which we are testing
the data set.
Initially we implement the binary classifiers
in python.
The classifiers that we are using to compare
the efficiencies are Linear, Polynomial,
RBF, Linear SVC
2.3.1 Linear Classifier
In the field of machine learning, the goal of
applied math classification is to use an
object's characteristics to identify which
class (or group) it belongs to .A linear
classifier achieves this by creating a
classification call supported the worth of a
linear combination of the characteristics. An
object's characteristics are also known as
feature values and are typically presented to
the machine in a vector called a feature
vector. Such classifiers work well for
practical problems such as document
classification, and more generally for
problems with many variables (features),
reaching accuracy levels comparable to nonlinear
classifiers while taking less time to
train and use.
If the input feature vector to the classifier is
a real vector, then the output score is
where could be a real vector of weights and
f could be a function that converts the dot
product of the two vectors into the specified
output.
A linear classifier is commonly utilized in
things wherever the speed of classification is
a problem. Linear classifiers typically work
fine once the quantity of dimensions in is
massive, as in document classification,
wherever every component in is often the
quantity of occurrences of a word in a
document In such cases, the classifier ought
to be well-regularized.
Figure 2.5: Linear Classifier
2.3.2. POLYNOMIAL CLASSIFIER
A quadratic classifier is employed in
machine learning and applied math
classification to separate measurements of
two or more classes of objects or events by a
quadric surface. It is a more general version
of the linear classifier.
Statistical classification considers a
collection of vectors of observations x of an
object or event, every of that includes a
familiar sort y. This set is referred to as the
training set. The problem is then to
determine for a given new observation
vector, what the most effective category
ought to be. For a quadratic classifier, the
proper solution is assumed to be quadratic
within the measurements, therefore y set
supported
x
TAx + b
T
x + c
2.3.3 RBF CLASSIFIER
In the field of mathematical modelling, a
radial basis function network is an artificial
neural network that uses radial basis
functions as activation functions. The output
of the network is a linear combination of
radial basis functions of the inputs and
neuron parameters. Radial basis function
networks have many uses, including
approximation, time, classification, and
system control.
• RBFs represent local receptors, as
illustrated below, where each green point is
a stored vector used in one RBF.
Figure 2.6: RBF Classifier
2.3.4 KENNARD STONE ALGORITHM
SIGNIFICANCE
All the classifiers above defined are
implemented in such a way that the training
data and test data is split randomly and there
is no particular way of splitting data by the
user. So KS algorithm helps to split training
and test data set separately by ranking the
samples. KS algorithm ranks the data
samples on the basis of their affinity to the
support vectors and hence comes up with the
best possible training set for the algorithm.
3. RESULTS
3.1 Linear Classifier
The figure:3.1 shows that Linear classifier
classifies Training data with an accuracy of
97% and testing data with an accuracy of
88.8% giving one sample to be wrongly
classified as type 4 when it is type 3 and also
wrongly classifying another sample as type
3 when it is type 4 which is represented by
confusion matrix.
Figure 3.1: Efficiency with Linear
Classifier
3.2 Polynomial Classifier
The figure:3.2 shows that Polynomial
classifier classifies Training data with an
accuracy of 26.4% by wrongly classifying
all samples to be type 1 and testing data with
an accuracy of 11.1% by wrongly
classifying all samples to be type 1 which is
represented by confusion matrix.
Figure 3.2: Efficiency with Polynomial
classifier
3.3 RBF Classifier
The figure:3.3 shows that Polynomial
classifier classifies Training data with an
accuracy of 95.5% by wrongly classifying
one sample to be type 5 when it is type 4
and testing data with an accuracy of 77.7%
by wrongly classifying three samples which
is represented by confusion matrix.
Figure 3.3: Efficiency with RBF classifier
3.4 Results For Implementation In
MATLAB:
3.4.1 Cross Validation
The figure 3.4 shows generation of testing
data(66) and training data(20) using k-fold
technique.
Figure 3.4: Cross validation with K-fold
Technique
3.4.2 Multi Class Classification
The figure: 3.5 shows that Multiclass
classifier classifies Training data with an
accuracy of 79.49% by wrongly classifying
4 samples to be type 2 when it is type 3 and
testing data with an accuracy of 75.0% by
wrongly classifying two samples.
Figure 3.5: Efficiency with Multiclass
classifier
Figure 3.6: Minimum number of training
samples for which accuracy is 100%
The figure:3.6 shows Ranking the samples
in order to generate efficient set of training
and testing data and hence displays
minimum number of training samples
required such that accuracy is 1
4. Conclusion and Future Scope
In this paper, we implemented Support
Vector Machine algorithm for separation of
different classes of polymers. The
absorbance values of these polymers under
NIR spectroscopy were collected to train
and test the classifier in the algorithm. First
binary classification was applied, the data
was then cross validated as well subjected to
Kennard Stone algorithm. The accuracy
achieved was 100% without cross validation
and varied between 70%-80% with cross
validation and after the application of KS
algorithm.
A multiclass classifying algorithm was
found fairly efficient when implemented in
MATLAB as well as Python. In MATLAB,
the accuracy showed varied results from
75%-90%. Whereas in Python, accuracy
achieved with cross validation and with
linear classifier was close to 95%.
There is scope for further improvements
such as implementation of KS algorithm to
the Python code as well as application of
various pre-processing routines to cancel out
noise from the data.
References
[1]. Multiclass and Binary SVM
Classification: Implications for Training and
Classification Users, an IEEE paper
published by A. Mathur and G. M. Foody.
[2].Fast SVM Training Algorithm with
Decomposition on Very Large Data Sets, an
IEEE paper published by Jian-xiong Dong,
Adam Krzyzak, and Ching Y. Suen.
[3].Extreme Learning Machine for
Regression and Multiclass Classification, an
IEEE paper published by Guang-Bin Huang,
Hongming Zhou, Xiaojian Ding, and Rui
Zhang.
[4] H. Drucker, C. J. Burges, L. Kaufman,
A. Smola, and V. Vapnik, ―Support vector
regression machines,‖ in Neural Information
Processing Systems 9, 528 IEEE
TRANSACTIONS ON SYSTEMS, MAN,
AND CYBERNETICS—PART B:
CYBERNETICS, VOL. 42, NO. 2, APRIL
2012 M. Mozer, J. Jordan, and T. Petscbe,
Eds. Cambridge, MA: MIT Press, 1997, pp.
155–161.
[6] G.-B. Huang, K. Z. Mao, C.-K. Siew,
and D.-S. Huang, ―Fast modular network
implementation for support vector
machines,‖ IEEE Trans. Neural Netw., vol.
16, no. 6, pp. 1651–1663, Nov. 2005.
[7] C.-W. Hsu and C.-J. Lin, ―A
comparison of methods for multiclass
support vector machines,‖ IEEE Trans.
Neural Netw., vol. 13, no. 2, pp. 415–425,
Mar. 2002.
[8] J. A. K. Suykens and J. Vandewalle,
―Multiclass least squares support vector
machines,‖ in Proc. IJCNN, Jul. 10–16,
1999, pp. 900–903.
[9] T. Van Gestel, J. A. K. Suykens, G.
Lanckriet, A. Lambrechts, B. De Moor, and
J. Vandewalle, ―Multiclass LS-SVMs:
Moderated outputs and coding-decoding
schemes,‖ Neural Process. Lett., vol. 15, no.
1, pp. 48–58, Feb. 2002.
[10] Y. Tang and H. H. Zhang, ―Multiclass
proximal support vector machines,‖ J.
Comput. Graph. Statist., vol. 15, no. 2, pp.
339–355, Jun. 2006.
[11] G.-B. Huang, Q.-Y. Zhu, and C.-K.
Siew, ―Extreme learning machine: A new
learning scheme of feedforward neural
networks,‖ in Proc. IJCNN, Budapest,
Hungary, Jul. 25–29, 2004, vol. 2, pp. 985–
990.
[12] G.-B. Huang, L. Chen, and C.-K. Siew,
―Universal approximation using incremental
constructive feedforward networks with
random hidden nodes,‖ IEEE Trans. Neural
Netw., vol. 17, no. 4, pp. 879–892, Jul.
2006.