e1. Introduction

The Sudden and loud increment in stored data for education has increased concern in the automatic transformation of the large data into some useful and meaningful information, patterns and knowledge. Data mining is also known as knowledge discovery that means process to extract useful, meaningful information from large amount of data. One of the major task of data mining is to find sequential patterns from data i.e. collection of items and sets of records that is transaction. So The concept of sequential pattern mining is introduced.

1.1 What is Sequential pattern mining?

Sequential pattern mining is generally defined as procedure to obtain the complete set of frequent subsequences from available set of sequences or to capture typical behaviors over time i.e. behaviors sufficiently repeated by individuals to be relevant for the decision maker. So,It is basically concerned with finding statically relevant patterns between data examples where the values are delivered in sequence and values are discrete. This include building efficient databases and indexes for sequence information, extracting the frequently occurring patterns, check similarity by comparing sequences with each other, and getting missing sequence members. Sequence mining problem can be classified as string mining which is typically based on string processing algorithm and item set mining which is typically based on association rule learning.

Sequential pattern mining is used in a large range of applications. There are many types of data with time-related format and many interesting sequential patterns algorithms have thus been provided in the past years. For example, from a customer purchase database a sequential pattern can be used to develop product and marketing strategies. For Web Log analysis, data patterns are very useful to give better structure to company's Web site for giving easier access to links. Sequential Pattern Mining can also be used for telecommunication networks, alarm databases, intrusion detection, DNA sequences, etc. However, for effectiveness consideration but also in order to efficiently aid decision making, constraints become more and more essential in many applications[11].

For example, If you\'re a store owner and you want to learn about your customer\'s buying behavior, you may not only be interested in what they buy together during one shopping trip. You might also want to know about patterns in their purchasing behavior over time. \"If a customer purchases baby lotion, then a new-born blanket, what are they likely to buy next?\" With information like this, a store owner can create clever marketing strategies to increase sales and/or profit.

1.2 Problem Statement

The problem of mining sequential patterns, and its associated notation, can be given as follows:

Let I={i1, i2, . . . , im} be a set of literals, termed items, which comprise the alphabet. An event is a non-empty unordered collection of items. It is assumed without loss of generality that items of an event are sorted in lexicographic order. A sequence is an ordered list of events. An event is denoted as (i1, i2, . . . ,ik), where ij is an item. A sequence is denoted as <1, 2,…. q>,where i is an event. A sequence with k-items, where k=j |j |,is termed a k-sequence. For example <B AC> is a 3-sequence. A sequence <1 , 2 . . . , n> is a subsequence of another sequence <1,2..,m> if there exist integers i1< i2< ..< in such that 1 i1 , 2 i2 , . . . , n in. For example the sequence <B AC> is a subsequence of <AB E ACD>, since B AB and AC ACD, and the order of events is preserved. However, the sequence AB E is not a subsequence of ABE and vice versa[12].

Suppose D is database of input sequence where each input sequence in the database has following attributes: sequence id,event time, and item. Here there is assumption that any sequence has no more that one event with the equal time stamp, so that time stamp may be used for event identification. In general, the support or frequency of a sequence, denoted by (,D),that is number (or proportion) of input-sequences in the database D that contain .This general definition has been changed as new algorithms are developed and different methods are introduced to calculate support. For user specified threshold, which is also known as minimum support, a sequence is said to be frequent if this sequence occurs more than minimum support times. The set of frequent k-sequence is denoted by Fk.

A sequence α is called a frequent Max sequential pattern in sequence database S, if α is a frequent sequential pattern in S, and there exists no frequent sequential pattern β in S, such that β is a proper super sequence of α. The problem of sequential pattern mining is to find the complete set of frequent sequential patterns in a sequence database, for a given minimum support threshold.

1.3 Categories of Sequential Pattern Mining Algorithm

In recent years, Many Sequential Pattern Mining Algorithms have been implemented by doing study and analysis of present algorithm with their effectiveness and efficiency.

Generally Sequential Pattern Mining Algorithms differ in two ways [6]:

• The process in which candidate sequences are generated and stored. The main objectives of algorithm are to minimize the set of candidate sequences.

• The process in which support and frequency of candidate sequence is counted.

Based on these two key criteria's sequential pattern mining can be divided into two parts:

• Apriori Based

• Pattern Growth Based

Figure 1.1 Classification of Sequential Algorithm

1.3.1 Apriori Based

The set of all frequent sequences is a superset of the set of frequent itemsets. Due to this similarity, the earlier sequential pattern-mining algorithms were derived from association rule mining techniques. The first of such sequential pattern-mining algorithms is the AprioriAll algorithm [Agrawal and Srikant 1995], derived from the Apriori algorithm [Agrawal et al. 1993; Agrawal and Srikant 1994][6].

The Apriori [Agrawal and Srikant 1994] and AprioriAll [Agrawal and Srikant 1995] set the basis for a breed of algorithms that depend largely on the apriori property and use the Apriori-generate join procedure to generate candidate sequences. The apriori property states that “All nonempty subsets of a frequent itemset must also be frequent”. It is also described as antimonotonic (or downward-closed), in that if a sequence cannot pass the minimum support test, all of its super sequences will also fail the test[6].

Key features of Apriori-based algorithm are:[6]

• Breadth-first search: Apriori-based algorithms are breath-first (level-wise) search algorithms because they build all the k-sequences, in the kth iteration of the algorithm, as they pass through the search space.

• Generate-and-test: This concept is used by some previous algorithms in sequential pattern mining. Algorithms that use this approach execute inefficient pruning method and generate a huge number of candidate sequences and then check for satisfying some user specified constraints. So this pruning process consumed a lot of memory in the early stages of sequential pattern mining.

• Multiple scans of the database: This feature entails scanning the original database to ascertain whether a long list of generated candidate sequences is frequent or not. It is a very undesirable characteristic of most Apriori based algorithms and requires a lot of processing time and I/O cost.

1.3.1.1 GSP(Generalized Sequential Pattern Mining)

GSP algorithm is developed by Agrawal and Shrikant[2].The algorithm makes multiple scan over the database. Initially algorithm scan the database for candidate length-1 and the item that does not meet minimum support will be discarded. for each level i.e., sequences of length-k,scan database to collect minimum support count for each candidate sequence to generate candidate length-(k+1) sequences from length-k frequent sequences using Apriori property. This process is continue until no any frequent or no any candidate generate.

Example: Consider the following database as input:

Figure 1.2 Simple Database Example

First sorting of the table is done by SID, and then by transaction time stamp EID. So we get,

Table 1.1 sorted database

Here Consider minimum support 2 to determine frequent items using GSP.

First list out support for all item

Items No. of occurrence of Items

A 4

B 4

C 1

D 2

E 1

F 4

G 1

H 1

Table 1.2 Count for each item(C1)

Here, A,B,D,F items satisfy minimum support. So Frequent 1- Sequence is A,B,D,F which is shown in the table 1.2.Candidate sets (Item sets) is denoted by C. Cn states candidates having n items. For eg C1 denotes item sets (or candidate sets) having 1 item which is shown in Table 1.2. And Itemsets which satisfies minimum support is denoted by L. Ln states candidate having n items. For eg. L1 denotes items sets (or candidate sets) having 1 item which is shown in Table 1.3.

Items No. of occurrence of Items

A 4

B 4

D 2

F 4

Table 1.3 L1

After that by joining L1 X L1, C2 (where k=2 ) can be generated but the condition is n-2 items should be common, so while joining for obtaining items sets for C2 items must be common.

Items No. of occurrence of Items

A->A 1

A->B 1

A->D 1

A->F 1

AB 3

AD 1

AF 3

B->A 2

B->B 1

B->D 1

B->F 1

BD 1

BF 4

D->A 2

D->B 2

D->F 2

D->D 1

DF 1

F->A 2

F->B 1

F->D 1

F->F 1

Table 1.4 C 2

Now the candidates which satisfy minimum support denoted by L2 as shown in Table 1.5.

Items No. of occurrence of Items

AB 3

AF 3

B->A 2

BF 4

D->A 2

D->B 2

D->F 2

F->A 2

Table 1.5 L2

After that we have to generate itemsets for 3-sequence,i.e.3 items in each itemsets. By joining L2 X L2 ,C3(where n=3) can be generated but the condition is k-2 items must be common, so here while joining for obtaining items sets for C3 items must be common which is shown in Table 1.6.

Items No. of occurrence of Items

ABF 3

AB->A 1

AF->A 1

BF->A 2

D->B->A 2

D->F->A 2

D->BF 2

F->AF 1

D->FA 1

D->AB 1

F->AB 0

Table 1.6 C3

Now the candidates which satisfy minimum support denoted by L3 as shown in Table 1.7.

Items No. of occurrence of Items

ABF 3

BF->A 2

D->B->A 2

D->F->A 2

D->BF 2

Table 1.7 L3

After that we have to generate itemsets for 4-sequence,i.e.4 items in each itemsets. By joining L3 X L3 ,C4(where n=4) can be generated but the condition is k-3 items must be common, so here while joining for obtaining items sets for C4 items must be common which is shown in Table 1.8.

Items No. of occurrence of Items

D->BF->A 2

Table 1.8 C4

Now the candidates which satisfy minimum support denoted by L4 as shown in Table 1.9.

Items No. of occurrence of Items

D->BF->A 2

Table 1.9 L4

As shown in table , we get 1, 4-sequence(Sequence of item set having 4 items) item set.

1.3.1.2 SPADE(Sequential PAttern Discovery using Equivalent Class)

This algorithm is for fast discovery of sequential pattern than GSP[5].It overcomes the GSP's problem of repeated database scan. It utilizes combinatorial properties to decompose the original problem into smaller sub-problems, that can be independently solved. It utilizes the vertical id list <sequence_id,event_id> database format and use a lattice-theoretic approach to decompose the original search space i.e. lattice into smaller search space i.e. sub-lattices which can be operated independently. All frequent sequences are found in 3 database scans, or only a single scan with some Pre-processed information. So not only minimizes I/O costs by reducing database scans, but also minimizes computational costs by using efficient search schemes.

Applying SPADE algorithm to the above example:

Spade uses vertical database format, So we have to maintain a < id-list> for each item, show in following figure 1.3.

Figure 1.3 ID-List for database example

Here Consider minimum support 2 to determine frequent items using SAPDE.

First list out support for all item as shown in Table 1.10

Items No. of occurrence of Items

A 4

B 4

C 1

D 2

E 1

F 4

G 1

H 1

Table 1.10 C1

Here, A,B,D,F items satisfy minimum support. So Frequent 1- Sequence is A,B,D,F which is shown in the table 1.11.Candidate sets (Item sets) is denoted by C. Cn states candidates having n items. For eg C1 denotes item sets (or candidate sets) having 1 item which is shown in Table 1.10. And Itemsets which satisfies minimum support is denoted by L. Ln states candidate having n items. For eg. L1 denotes items sets (or candidate sets) having 1 item which is shown in Table 1.11.

Items No. of occurrence of Items

A 4

B 4

D 2

F 4

Table 1.11 L1.

Now we have to generate temporal join and non-temporal join,by using this join we can identify frequent items. Non Temporal join or equality join can be performed if the items must occur at the same time.For example, AB is non-temporal, because they occur at the same time. We have to find all occurrences of A and B with the same e-id and store them in id-list as shown in Table 1.12. and We can perform temporal join one at a time to obtain final-id list as shown in table below 1.18.

For AB,

SID EID A EID B

1 15 15

1 20 20

2 15 15

3 10 10

Table 1.12. Id-list for AB

In table 1.12,Here is two sequence have same SID=1, that means same sequence occurred two times for same customer. In such type of case, we ignore 1 count for sequence SID=1, so total occurrence of AB is 3 which satisfy minimum support.

SID EID A EID D

1 25 25

Table 1.13. Id-list for AD

In Table 1.13, Total occurrence for AD is 1,so it does not satisfy minimum support.

SID EID A EID F

1 20 20

1 25 25

2 15 15

3 10 10

Table 1.14. Id-list for AF

In table 1.14, Here is two sequence have same SID=1, that means same sequence occurred two times for same customer. In such type of case, we ignore 1 count for sequence SID=1, so total occurrence of AF is 3 which satisfy minimum support.

SID EID B EID D

- - -

Table 1.15. Id-list for BD

In Table 1.15, Total occurrence for BD is 0,so it does not satisfy minimum support.

SID EID B EID F

1 20 20

2 15 15

3 10 10

4 20 20

Table 1.16. Id-list for BF

In Table 1.16, Total occurrence for BF is 4,So It satisfy minimum support.

SID EID D EID F

1 25 25

Table 1.17. Id-list for DF

In Table 1.17, Total occurrence for DF is 1,so it does not satisfy minimum support.

SID EID A EID A

1 15 20

1 20 25

Table 1.18 Id-list for A->A

In table 1.18, number of occurrence is 2,since both sequences have same SID=1, so number of occurrence is actually 1.So It does not satisfy minimum support.

SID EID A EID B

1 15 20

Table 1.19 Id-list for A->B

In Table 1.19, Total occurrence for A->B is 1,so it does not satisfy minimum support.

SID EID A EID D

1 15 25

1 20 25

Table 1.20 Id-list for A->D

In table 1.20, number of occurrence is 2,since both sequences have same SID=1, so number of occurrence is actually 1.So It does not satisfy minimum support.

SID EID A EID F

1 15 20

1 15 25

1 20 25

Table 1.21 Id-list for A->F

In table 1.21, number of occurrence is 3, since all sequences have same SID=1, so number of occurrence is actually 1.So It does not satisfy minimum support.

SID EID B EID A

1 15 20

4 20 25

Table 1.22 Id-list for B->A

In Table 1.22, Total occurrence for B->A is 2,so it satisfies minimum support.

SID EID B EID B

1 15 15

1 20 20

Table 1.23 Id-list for B->B

In table 1.23, number of occurrence is 2, since all sequences have same SID=1, so number of occurrence is actually 1.So It does not satisfy minimum support.

SID EID B EID D

1 15 25

1 20 25

Table 1.24 Id-list for B->D

In table 1.24, number of occurrence is 2, since all sequences have same SID=1, so number of occurrence is actually 1.So It does not satisfy minimum support.

SID EID B EID F

1 20 25

Table 1.25 Id-list for B->F

In Table 1.25, Total occurrence for B->F is 1,so it does not satisfy minimum support.

SID EID D EID D

1 10 25

Table 1.26 Id-list for D->D

In Table 1.26, Total occurrence for D->D is 1,so it does not satisfy minimum support.

SID EID D EID A

1 10 20

1 10 25

4 10 25

Table 1.27 Id-list for D->A

In table 1.27, number of occurrence is 3, since 2 sequences have same SID=1, so number of occurrence is actually 2.So It satisfies minimum support.

SID EID F EID D

1 10 15

1 10 20

4 10 20

Table 1.28 Id-list for D->B

In Table 1.28, Total occurrence for D->B is 2,so it satisfies minimum support.

SID EID D EID F

1 10 20

1 10 25

4 10 20

Table 1.29 Id-list for D->F

In Table 1.29, Total occurrence for D->F is 2,so it satisfies minimum support.

SID EID F EID F

1 20 25

Table 1.30 Id-list for F->F

In Table 1.30, Total occurrence for F->F is 1,so it does not satisfy minimum support.

SID EID F EID D

1 20 25

Table 1.31 Id-list for F->D

In Table 1.31, Total occurrence for F->D is 1,so it does not satisfy minimum support.

SID EID F EID B

1 20 25

Table 1.32 Id-list for F->B

In Table 1.32, Total occurrence for F->B is 1,so it does not satisfy minimum support.

SID EID F EID A

1 20 25

4 20 25

Table 1.33 Id-list for F->A

In Table 1.33, Total occurrence for F->A is 2,so it satisfies minimum support.

Following Table 1.34 shows number of occurrence of sequences of items i.e, C2 and sequence of items that support minimum support are shown in Table 1.35 i.e.,L2.

Items No. of occurrence

AB 3

AD 1

AF 3

BD 0

BF 4

DF 1

A->A 1

A->B 1

A->D 1

A->F 1

B->A 2

B->B 1

B->D 1

B->F 1

D->D 1

D->A 2

D->B 2

D->F 2

F->F 1

F->D 1

F->B 1

F->A 1

Table 1.34 C2

Items No. Of occurrence

AB 3

AF 3

BF 4

B->A 2

D->A 2

D->B 2

D->F 2

F->A 2

Table 1.35 L2

After that we have to generate itemsets for 3-sequence,i.e.3 items in each itemsets using the same procedure.

SID EID A EID B EID F

1 20 20 20

2 15 15 15

3 10 10 10

Table 1.36 Id list for ABF

In Table 1.36, Total occurrence for ABF is 3,so it satisfies minimum support.

SID EID A EID B EID A

1 15 15 20

1 20 20 25

2 15 15 -

3 10 10 -

Table 1.37 Id-list for AB->A

In table 1.37, number of occurrence is 2, but both sequences have same SID=1, so number of occurrence is actually 1.So It does not satisfy minimum support.Here number of occurrence is 2 because sequence AB is followed by A for SID=1,but for SID=2 and SID=3,AB is not followed by A so there is dash.

SID EID A EID F EID A

1 20 20 25

1 25 25 -

2 15 15 -

3 10 10 -

Table 1.38 Id-list for AF->A

In Table 1.38, Total occurrence for AF->A is 1,so it does not satisfy minimum support.

SID EID B EID F EID A

1 20 20 25

2 15 15 -

3 10 10 -

4 20 20 25

Table 1.39 Id-list for BF->A

In Table 1.39, Total occurrence for BF->A is 2,so it satisfies minimum support.

SID EID D EID B EID A

1 10 15 20

4 10 20 25

Table 1.40 Id-list for D->B->A

In Table 1.40, Total occurrence for D->B->A is 2,so it satisfies minimum support.

SID EID D EID F EID A

1 10 20 25

4 10 20 25

Table 1.41 Id-list for D->F->A

In Table 1.41, Total occurrence for D->F->A is 2,so it satisfies minimum support.

SID EID D EID B EID F

1 10 20 20

2 - 15 15

3 - 10 10

4 10 20 20

Table 1.42 Id-list for D->BF

In Table 1.42, Total occurrence for D->BF is 2,so it satisfies minimum support.

SID EID F EID A EID F

1 - 20 20

1 - 25 25

2 - 15 15

3 - 10 10

Table 1.43 Id-list for F->AF

In Table 1.43, Total occurrence for F->AF is 0,so it does not satisfy minimum support.

SID EID D EID A EID F

1 10 20 20

1 10 25 25

2 - 15 15

3 - 10 10

Table 1.44 Id-list for D->AF

In Table 1.44, Total occurrence for D->AF is 1,so it does not satisfy minimum support.

SID EID D EID A EID B

1 10 15 15

1 10 20 20

2 - 15 15

3 - 10 10

Table 1.45 Id-list for D->AB

In Table 1.45, Total occurrence for D->AB is 1,so it does not satisfy minimum support.

SID EID F EID A EID B

1 - 15 15

1 - 20 20

2 - 15 15

3 - 10 10

Table 1.46 Id-list for F->AB

In Table 1.46, Total occurrence for F->AB is 0,so it does not satisfy minimum support.

SID EID B EID A EID B

1 - 15 15

1 15 20 20

2 - 15 15

3 - 10 10

Table 1.47 Id-list for B->AB

In Table 1.47, Total occurrence for B->AB is 1,so it does not satisfy minimum support.

SID EID B EID A EID F

1 15 20 20

1 15 25 25

1 20 25 25

2 - 15 15

3 - 10 10

Table 1.48 Id-list for B->AF

In Table 1.48, Total occurrence for B->AF is 1,so it does not satisfy minimum support.

Following Table 1.49 shows number of occurrence of sequences of items i.e, C3 and sequence of items that support minimum support are shown in Table 1.50 i.e.,L3.

Items No. of occurrence

ABF 3

AB->A 1

AF->A 1

BF->A 2

D->B->A 2

D->F->A 2

D->BF 2

F->AF 0

D->AF 1

D->AB 1

F->AB 0

B->AB 1

B->AF 1

Table 1.49 C3

Items No. of occurrence

ABF 3

BF->A 2

D->B->A 2

D->F->A 2

D->BF 2

Table 1.50 L3

After that we have to generate itemsets for 4-sequence,i.e.4 items in each itemsets using the same procedure.

SID EID A EID B EID F EID A

1 20 20 20 25

2 15 15 15 -

3 10 10 10 -

Table 1.51 Id-list for ABF->A

In Table 1.51, Total occurrence for ABF->A is 1,so it does not satisfy minimum support.

SID EID D EID B EID F EID A

1 10 20 20 25

2 - 15 15 -

3 - 10 10 -

4 10 20 20 25

Table 1.52 Id-list for D->BF->A

For D->BF->A, the number of occurrence is 2, which satisfy minimum support.

1.3.1.3 SPAM(Sequential Pattern Mining using A Bitmap Representation)

This paper has developed depth-first search strategy that integrates a depth-first traversal of the search space with effective pruning mechanisms. The transactional data is stored using a vertical bitmap representation, which allows for efficient support counting as well as significant bitmap compression. The algorithm proves efficient when the sequential patterns in the database are very long.

1.3.1.4 SPIRIT(Sequential Pattern Mining with Regular Expression Constraints)

SPIRIT is a family of algorithms for sequential pattern mining with regular expression constraints. Its general idea is to use some relaxed constraint which has nice property to prune [spirit]. There exist several versions of the algorithm, differing in the degree to which the constraints are enforced to prune the search space of pattern during computation. The main distinguishing factor among the schemes is the degree to which the regular expression constraints are enforced to prune the search space. In particular, algorithm SPIRIT uses a relaxed constraint “valid with respect to some state of ME for a given regular expression E, where ME is the deterministic finite automata corresponding to E. Since SPIRIT has overall the best performance among the SPIRIT family.

1.3.2 Pattern Growth Based

The idea behind development of Pattern-Growth Algorithms is to eliminate the candidate generation step and emphasize on the search on a limited portion of the initial database. The feature like search space partitioning plays an important role in pattern-growth. Almost every pattern-growth algorithm initial starts by representation of the database to be mined, then proposes a way to partition the search space, and generates as minimum candidate sequences as possible by growing on the already mined frequent sequences, and then applying the apriori property as the search space is being traversed recursively looking for frequent sequences. The early algorithms started by using projected databases, for example, FreeSpan[8],PrefixSpan [11], with the latter being the most influential. Key features of pattern growth-based algorithm are: [6]

• Search space partitioning: In this partition, it allows to do partition of the generated search space of large candidate sequences for managing memory efficiently. Different techniques are there to partition the search space. As the partition of the search space is done, smaller partitions can be mined in parallel. There are different advanced techniques for search space partitioning, this include projected databases and conditional search, denoted to as split-and-project techniques.

• Tree projection: Tree projection usually accompanies pattern-growth algorithms. In this technique, algorithms implement a physical tree data structure representation of the search space, which is then traversed breadth-first or depth-first in search of frequent sequences, and pruning is based on the Apriori property.

• Depth-first traversal: This traversal technique makes a big difference in performance, and also helps in the early pruning of candidate sequences as well as mining of closed sequences [Wang and Han 2004]. The key reason for this performance is the fact that depth-first traversal employs far less memory, more directed search space, and thus less candidate sequence generation than breadth-first or post-order which are used by some early algorithms.

• Candidate sequence pruning: Pattern-growth algorithms try to utilize a data structure that allows them to prune candidate sequences early in the mining process. This result in early display of smaller search space and maintain a more directed and narrower search procedure.

1.3.2.1 FreeSpan (Frequent Pattern Projected Sequential Pattern Mining): FreeSpan was developed to significantly reduce the classy candidate generation. In FreeSpan uses frequent items to recursively project sequence databases into a set of smaller projected databases and grows subsequence fragments in each projected database. This process partitions both the data and the set of frequent patterns to be tested, and confines each test being conducted to the corresponding smaller projected database. The trade-off is a considerable amount of sequence duplication as the same sequence could appear in more than one projected database. However, the size of each projected database usually (but not necessarily) decreases rapidly with recursion[10].

1.3.2.2 PREFIXSPAN (Prefix-projected Sequ- ential patternmining): PrefixSpan is an another pattern-growth based approach method for mining sequential patterns and explores prefix projection in sequential pattern mining. Main logic behind this is that, FreeSpan algorithm performs projection of database by considering all possible occurrence of frequent sub sequence while PreefixSpan performs the projection of database which is based only on frequent prefixes because any frequent subsequence can normally find by growing a frequent prefix. The complete set of sequential patterns partitioned into the subsets according to the prefixes. The subsets of sequential patterns can be mined by constructing corresponding projected databases and mine each recursively. PrefixSpan mines the complete set of by employing the divide-and-conquer strategy and greatly reduces the efforts of candidate subsequence generation [9].

1.3.2.3 MSPVF(Mining Sequential Patterns using Time-Constraint based on Vertical Format): The MSPVF(Mining Sequential Patterns using Time-Constraint based on Vertical Format) algorithm presented by Di Wu, Xiaoxue Wang1 Ting Zuo ,Tieli Sun, and Fengqin Yang which is based on Apriori algorithm, which uses the minimum gap, maximum gap and duration.. Time constraints is added on Apriori algorithm in order to enhance users' satisfaction,Meanwhile, a new five-tuple list object is used by the MSPVF algorithm to store data in vertical format to simplify the mining process. The five-tuple structure is (SID, EID,start-time, end-time, last-time), where SID and EID recordthe location of element e in the sequence database. Here, e is the last element of the sub sequence. SID records thesequence ID, and EID is the element ID.

1.3.2.4 Mining Weighted Sequential Patterns in a Sequence Database with a Time-Interval Weight[11]: It represents a new underlying structure for discovering time-interval weighted sequential(TiWS) patterns in a sequence database and time-interval weighted support (TiW-support) to find the TiWS patterns. In framework of TiWS pattern mining, the weight of each sequence in a sequence database is first obtained from the time-intervals of data elements in the sequence. It is based on the assumption that a sequence with small time-intervals is more valuable and subsequently TiWS patterns are found considering the weight. The time-interval between a pair of item sets is a positive value with no limitation.

1.4 Objectives

• To Perform failure analysis from students' grade history.

• We want to convert student grade history in to sequential pattern database. From sequential pattern database i want to identify:

o In which subjects students have maximum failure and how long they persist.

o Is there any sequence of subjects present which is common across many students? If this can be indentify then we may plan to avoid such cases for upcoming students.

• To identify common sequential patterns using available sequential pattern mining algorithms and compare their performance.

• If any modification is needed to extract some meaningful pattern then some modification can be done in existing algorithm(s).

**...(download the rest of the essay above)**