CHAPTER – 1
FUNDAMENTALS OF WEB SERVICE MINING
Current Web services are created and updated on the run. Web service mining is a relatively new area of research. This growing popularity of the Web and Web services presents an entirely new area in research. It is process aimed at discovering interesting and useful compositions in existing Web services. The Web service composition approaches are very useful where primary goals are unavailable or unknown. It differs from the conventional top down approaches driven by specific a criterion. The service does not assume any prior knowledge and instead of searching specific compositions, it relies on component services to explore data and the result may vary from a simple composition to a complex one. Though, the Web was invented in the 1990’s by Tim Berners-Lee for information sharing among scientists , only after 1993 the Web was available for public information sharing due to the Web browser Mosaic. Mosaic quickly captured the imagination of the public on the Web through its graphical user interface. Governmental Organizations, Business establishments realized the potential of the Web and took advantage of its popularity by sharing their data and applications on the Web. The web from its starting position as s a repository of information with text and images, evolved into a host for providing multimedia content and service providing application like map-finding, weather-reporting, e-commerce etc. Real-time applications involved hardware devices like temperature sensors and traffic monitoring cameras.
Business and government integrated existing Web applications to provide new value-added services. Customized interfaces required applications to access this data and the lack of semantics in data made integration of the applications a overwhelming challenge. Attempts to solve these problems resulted in underlying and enabling technologies. Many of the researches are driven by cross-enterprise workflow and AI planning. This thesis presents a novel approach in Web service mining by introducing a Web service. A set of algorithms and a conceptual nut novel framework is also introduced for service compositions. The effectiveness of the framework and challenges addressed by the framework are also discussed. The thesis also discusses transactional approaches .
1.2 WEB SERVICE MINING
Definition of Web Serving Mining: “Given a set of Web services WS with a set of rules that can be composed, CR, interesting measures IM for a user UR, a composite Web service is derived by using a subset of rules in CR in a subset of Web services WS, that exhibits values for a user UR, selected subset of interesting measures IM.” . The W3C consortium defines a Web service as an application or a software component identified by a URI with interfaces and binding described as XML, capable of being discovered by other Web services by definition and as a service which can interact directly with other Web services using XML and internet protocols Three major standardization initiatives were proposed by the consortium to support interactions between the Web services:
SOAP(a protocol of transport allowing the exchange of XML documents in a distributed and decentralized environment), UDDI (a specification, defining mechanisms which make it possible to publish services, discover and interact with other services on the Web) and WSDL(document describing a Web service, its location and the way to call the service). WSDL Web service mining requires Web services and the subject for mining information has to be clearly defined to target unambiguous items. Any Web service mining provides an effective means of describing related Web service concepts and their relationships. The key concepts of Web service mining can be categorized as
• Message. Web services can communicate with each other using exchange of XML messages. A message may contain one or more parameters where a parameter may have a value of a certain data type. Messages can be used to send parameters to a service operation or use results from a service operation.
• Condition. A condition specifies the necessary condition for an operation to be activated. Conditions also describe the state of the information after execution and can also be related to parameters for retrieving information.
• Domain. Web service operations can be categorized by areas of interest called domains. A domain has descriptions on the purpose of the domain and its functionalities like travel, entertainment, healthcare, drug design, etc.
• Locale of Interest. is a semantic restriction describing the applicability of an interface. For Example restricting a regional or geographical boundary.
• Quality of Web Service is evaluating an operation provided by a Web service with a quality attribute. The quality attributes to a Web service operation can be organized into three categories: runtime, business, and security
• Operation Interface. Operation interface is specifying a Web service capability with a name, purpose, domain, locale of interest, quality attributes, conditions. There are four operation modes:
o One-way – operation interface contains only Message sent
o Notification – operation interface contains only results
o Request-response – operation gets results and send a Message.
o Solicit-response – operation generates results and waits for a message.
• Operation is a specific implementation of an Operation Interface by a Web service. An operation can also exhibit values for the quality attributes..
• Web Service is always defined by a tuple (Name, Description, Operations, other non functional properties), where:
o • Name: name of the Web service;
o • Description: A text summary about service capabilities;
o • Operations : Set of capabilities provided by a Web service
o • NFP : non functional property which describe the Web service.
A simple web service model is shown in Fig. 1
Fig. 1 A simple web service model
1.3 WEB SERVICE MINING ARCHITECTURE
Web services are self-describing, self-contained, modular applications published and invoked on the Web. Many organizations have implemented their core business applications on the Internet. The ability to efficiently and effectively select or integrate their heterogeneous services on the Web at runtime is an important step towards the development of any Web service application. The ease of application integration and ontologies have contributed to the popularity of a Web service composition. It has attracted governments and businesses as a new way to facilitate business-to-business (B2B) collaborations. A Web service composition aims at providing value-added services. The application necessarily takes advantage of existing services and assembles them to meet specific service requirements. Two different types of approaches helped standardize compositions. The business world targeted an earlier Web service paradigm and developed a number of XML-based standards like WSFL , WSCI , XLANG  and BPEL4WS ) focusing on formalizing Web service specifications, flow composition and executions based on syntactical characteristics . Recently the Semantic Web research community focused on using the concept of Semantic Web services and developed complementary standards like DAML-S, OWL-S [8.]. The standards complement each other and take a top down approach. The user provides search criteria and defines the exact service functionality required. For Example composition of a travel service may require flight/train bookings, car hires or hotel reservations and the search may reflect the interest, knowledge of the service. A Simple Web Service Architecture is shown in Fig. 2.
Fig. 2. A Simple Web Service Architecture
Another technology is the use of ontologies in describing the content of the application processes . Ontologies have their roots to the AI community for facilitating knowledge sharing and reuse. Ontology allows the specification of a shared conceptualization in an explicit readable format resulting in a standard interpretation. Ontologies help build a vocabulary of concepts that can be used by applications to unambiguously describe the content generated and interpret the contents. Empowerment of Web services with semantics using ontologies has brought about the next generation of Web services called Semantic Web services. The Semantic deployment of independently developed applications allows applications to interoperate at the application interface and information semantics levels. The integration for value-added services becomes an easier and inexpensive task compared to earlier approaches relying on technologies like EDI, CORBA and EJB. W3C had to revise its standard Web service architecture , with the advent of Semantic Web services.
The discovery service first obtains the Web Service Description (WSD) in WSDL  and its associated Functional Description (FD) of the provider. The functional description describes the functionality of the provider service and can be processed by a machine like RDF , DAMLS, OWL-S ). The requester supplies the criteria to the discovery service for selecting a WSD based on its associated FD. The next Step ensures that boh the requester and the provider agree on the semantics of the interaction with the use of ontologies as shown in Fig. 3.
Fig. 3. Sematic Web Service Architecture
1.4 WEB SERVICE COMPOSITION APPROACHES
Data mining is extracting interesting information from data in databases. It is the sifting through databases to find patterns of data using computer programs. Data mining receives attention due to the voluminous data made available. Enterprises maintain terabytes of data due the availability of cheap storage media. . Data mining techniques classify data, detect anomalies and predict data. The Data Mining techniques search for consistency in patterns and relationships between variables . The results are then applied on new data sets for validating them. Data mining techniques have been used to forecast weather [16.], predict stocks , sports analysis , medical diagnosis , and fraud detection . Data mining is similar to Web service mining in extracting previously unknown and useful information. Web services encapsulate behaviors using a set of dynamic operations and traditional data mining techniques cannot be used to determine the composability of two services. Service composition can be from two fundamentally different strategies. The first is the top down strategy requiring a specific goal based search criteria to start the composition process like traditional Web service composition approaches. Since user provides the goal the type of composition is anticipated by the user. The evaluation of composition interestingness is not a major concern area in such approaches. The bottom up strategy is driven by the need to find interesting and potentially useful compositions of existing Web services and without using specific search criteria (Web service mining). Web service mining techniques need to address the interestingness of service compositions. Semantics-based composition approaches use ontologies to an advantage while describing Web services.
The use of ontologies allows unambiguous interpretation of a Web service semantics by a computer. Based on the DARPA Agent Markup Language Ontology Inference Layer (DAML+OIL), Web Ontology Language for Services (OWL-S) . Service ontology aims at enabling Web services composition planning , reasoning and automatic use of services by software agents. Fig. 4 shows the elements of a web composition model.
Fig. 4. Web Service Composition Model
A service profile is an abstract of what it can do. The abstract includes the required input of the service, the output the service, the preconditions and the after effects of the execution, collectively known as the IOPEs of a given service. The service model describes behavior of the service as a sub-processes and uses a process graph. Atomic processes can be without sub-processes and be invocable. Simple processes also can be without sub-processes and be revocable. Composite processes generally have sub-processes linked by control constructs like sequence, split, choice, iteration and if-then-else. Service grounding describes the access control to a Web service by including descriptions for message formatting, mechanisms of transport, protocols, and serializations. The more recent Web Service Modeling Language (WSML)  is based on the Web Service Modeling Ontology (WSMO) [23.]. The core language uses description logics and programming. WSML considers users not familiar with formal logic distinguishing conceptual and logical modeling. WSMO declares inputs, outputs, pre-conditions, and results associated with services like OWL-S, but do not provide a notation for building composite processes in terms of controlling flow and data. It alternatively focuses on specification of internal and external choreography using approach based abstract state machines. Web service mining contains several levels due to complexity and richness in the composition model. For Example the Web service WSDL interface level is a set of operations offered by the Web service, the Web service abstract process level is the execution order logic between the interfaces. Web services choreography level defines the interactions exchanged within a given choreography and the Orchestration model of a composite Web service is the executable process implementing a composite Web service.
Dustdar  relied on analyzing log data OF Web service executions to discover process workflow instances in these services. Identifying interesting workflows is difficult when such logs are absent or component Web services are at the introductory stage. Web Service Logging is gathering the relevant Web data, to be analyzed for useful information about Web Service behaviors. Logging can produce richness in data needed for implementing additional features. The Web services logging can be trivial and advanced. Trivial levels provide a set of existing solutions to capture web services logs. Web service log collection can be from two main sources. The Web log collection corresponds to the software systems data on the Web server and Web client. The collections are achieved by enabling Web servers logging facilities. Web Usage Mining WUM  researches describe the most common mean of web log collection , since server logs are stored in the Common Log Format  or the recently Combined Log Format . Most Web servers support the Common Log Format as a default option. The log tracks different elements of the Web transactions with each request recorded in a line of text, with elements of the request separated by spaces and items not sent as a hyphen or dash. The log conceived originally for administrative purposes, stores data as sequential strings containing: the requestors IP address, user identification, timestamp, request method, request status code , sent data (number of bytes), authenticated user name, the User Agent. For Example
127.0.0.1- -[02/Apr/2013:9:50:11+0100] “POST/MMS-Server/services/Document Delivery by DHL HTTP/1.0” 500 819 “-” ” MMS-Server /1.1″
1.5 BACKGROUND LITERATURE SURVEY
Researchers have addressed problems of interleaving web service discovery and compositions by considering simple workflows of web services with one input and one output parameter . The web service composition is restricted to a sequence of limited web services corresponding to a linear workflow of web services. The suggested solution retrieves a sequence of causal links between web services, Aiming to generate a composite service plan from existing services, a composition path was proposed in . The path consisted of a sequence of operators computing data, and connectors with provision for data transport between operators. The search for operators to construct a sequence is based on the shortest path algorithm. Only two kinds of services operator and connector, with an input and output parameter are considered, contrary to the model proposed with more than one input and output parameter [30,31]. A composition of services as a directed graph with nodes linked by matching compatibility between input and output parameters is considered in ,.The shortest sequence of web services are derived from the graph. The sequence corresponds to an ordered set of web services matching all expected output parameters for the given inputs by a user. Semantic web service compositions are performed by pre-computing the causal link matrix in . The composition strategy is based on AI planning and performs a regression-based approach and returns a set of correct, complete and consistent plans. The services are actions semantically linked by causal links.
The composition strategy is based on AI planning and performs a regression-based approach and returns a set of correct, complete and consistent plans. The services are actions semantically linked by causal links. However, these two approaches compute the best composition based on the semantic similarity of output and input parameters of a web services and not considering any non-functional properties .A modelling tool called interface automata was introduced to represent web services and perform compositions where Atomic services are stored as a graph and each node represents input and output parameters while edges represent web services. Each service has a description of inputs, outputs, and dependencies in other web services. The service descriptions and a graph used to discover composition results that satisfy a service request. A that composer that supports the end user to select web services for each activity in the composition, create flow specifications to link them is introduced in . After selecting a web service, the web services producing an output are fed as the input of a selected service based on profile descriptions. The user can manually select the service fitting in a particular activity and the system generates a composite process in DAML-S. The composition is executed by calling each service separately, and passing the results between services based on flow specifications. In Web Service Compositions, Several standardization and prototype efforts were undertaken. Composition related approaches can be grouped into two different categories business process-oriented and semantics-based . The Petri-net approach  graphically represents represent operations as a process and a connected graph where nodes (places) are used to represent states and other nodes ( transitions). One token in every place connected to an operation enables the operation.
The operation may remove one token from every input and deposit the token as output .A service can be in one of the following states: not instantiated, ready, running, suspended, or completed, At any given time. After each service is defined, a variety of compositions can be defined by including sequence, alternatives, iterations etc. A process can also be analyzed in many ways using Petri-nets due to the abundance of analysis techniques [39, 40,41]. Petri-nets can be used to determine the presence of live locks. Algebraic Process Composition models processes are based on calculus , in which the basic entity can be one of the following,: an empty process, a choice between I/O operations, a parallel composition, a recursive definition or invocation. I/O operations can receive or send. IBM’s Web Services Flow Language (WSFL)  is an XML based language for describing Web service compositions. WSFL is based on Petri-nets and provides two models, the flow model and global model. The flow model aims at specifying the logic of a business process. It uses a directed graph to model the sequence of the functionality provided by a composed service to control flow and data between component services. Each node in the graph, is an activity and represents a step in the business goal the composition tries to achieve. The control links type uses the links to connect activities and prescribe the activities order. The second type called data links, represent the flow of data between activities. The global model aims at defining the mutual exploitation of Web services in decentralized or distributed business processes. Since no specification of an execution sequence is provided, it relies on the use of plug links to represent interactions. WSFL also aims at supporting recursive compositions of services.
In WSFL, every Web service composition (flow, global) can transform itself into a new Web Service and be a component for new compositions. BEA Systems’ Web Service Choreography Interface (WSCI)  a XML-based interface description language describes Web Service operations choreography in the context of a message exchange of participating Web Services. WSCI describes how the choreography should expose relevant information like message correlation, exception handling, and descriptions of transactions. Message correlation is achieved by associating exchanged messages with correlation properties identifying a conversion. Exceptions occur due to the receipt of an out-of-context message, or a fault or a timeout. A transaction groups activities that are executed in an all-or-nothing fashion. Activities can be atomic or complex (recursively composed of other activities). Choreography describes logical dependencies between activities. Microsoft’s XLANG  is based on calculus  and extends a WSDL service description with a behavioral element and at the intra-service level. A behavior defines a list of actions belonging to the service and the order performing actions. XLANG supports two action types, WSDL operations and XLANG-specific actions which includes exceptions and deadline/duration based timeouts. Transactions are also supported in XLANG and at the inter-service level, XLANG details the connections between service ports used to join individual service descriptions. The incompatibility between WSFL and WSCI/XLANG resulted in the development of Business Process Execution Language for Web Services (BPEL4WS) . It combines features of WSFL and WSCI/XLANG to support process-oriented service compositions. Process is composed of activities and the execution of a process might encounter exceptions.
Message correlation and transactions(WSCI and XLANG) are supported. BPEL4WS has several implementations for both J2EE and .NET platforms, including IBM WebSphere , Oracle BPEL Process Manager , Microsoft BizTalk , OpenStorm Service Orchestrator  and ActiveBPEL [52.]. Business Transaction Protocol (BTP)  is designed to support interactions crossing application and administrative boundaries. Business Process Modeling Language (BPML)  shares the same root as WSCI with BPEL4WS. It uses WSCI for expressing public interfaces and choreographies and povides advanced process model semantics like nested processes and complex compensated transactions. Electronic Business XML (ebXML)  is an international initiative by the United Nations Centre for Trade Facilitation and Electronic Business (UN/CEFACT) and the Organization for the Advancement of Structured Information Standards (OASIS). ebXML defines standard business processes and trading agreements among different organizations. The vocabulary consists of a process specification document describing the activities of the parties in an ebXML interaction, a collaboration protocol profile describing the organization’s profile and a collaborative partner agreement representing an agreement between partners. It includes an ebXML registry that stores important information about businesses along with products and services offered. ebXML registries have an advantage over UDDI registries since they allow SQL-based queries on keywords. A framework composed of a multilayered architecture and a transactional model was presented . Standards BTP , Web ServiceAtomic Transaction  and Web ServiceBusinessActivity  define transaction protocols between composed services.  presented a transaction management model based on tentative hold and compensation concepts.
A Web service was announced by IBM, HP, Sun and Microsoft in the year 2000. The initiatives included IBM’s Web services, Sun’s Open Network Environment (ONE), HP’s e-speak and Microsoft’s Dot net. The World Wide Web Consortium (W3C) published the specification of a Web service . Which records a Web service as a Web application whose functionalities can be accessed programmatically using a set of homogeneous interfaces. Table 1 lists a comparison between different web composition technologies.
Table 1. Comparison of BPEL4WS, BPML, WSCI, WS-CDL and DAML-S
BPEL4WS BPML WS-CDL WSCI DAML-S
Modeling the collaboration Strong support Indirect support Strong support Strong support Strong support
Modeling the execution control Strong support Strong support No support No support Strong support
Representation of the Role Weak support No support Strong support Strong support No support
Transaction and Compensation Indirect support Strong support Indirect support Strong support Indirect support
Exception handling Strong support Strong support Support Strong support Strong support
Semantic support No support No support No support No support Strong support
Business agreement support No support No support No support No support No support
Software vendor support Many Few No Few Few
1.6 OBJECTIVE OF THE STUDY
The objectives include Identification of activities in the mining process, suggesting a new mining framework and suggest efficient algorithms to automate activities. The study also suggests measures to objectively evaluate the interestingness and usefulness of the mining results and determine strategies for evaluating the usefulness of the mining results.
1.7 MOTIVATION OF THE THESIS
The Web services are in transition from data based to Semantic based services. The Web services would be the primary objects with increasing opportunities to compose new useful and interesting Web services from existing resources. The collective opportunities of composing services will be an unexpected application to many. Further, the indefinite scope of search queries and the ability to discover them makes it motivating and can be equated to gaining competitive business advantages. Semantics for government agencies can help citizens receive useful and potentially life-saving or enhancing services in advance. The ability to proactively discover useful composite services even when the goals are unspecified is also a challenging area of research. Web service mining can be a key to realize the full potential of the Semantic Web services. An effective framework in Web service mining would generate interesting and usefully composed Web services.
1.8 ISSUES IN WEB COMPOSITIONS
Indistinctiveness or lack of standardization in dependencies of web compositions , can complicate design of a composition. Agreement might not be reached due to reasons like conflicting interests, lack of co-operation resulting in actor-oriented problems. The consequence of unclear decisions results in heavy risks linking the failure of a composition. Thus, the characteristics of a web composition are an important set of requirements for creating a composition. The requirements can also be viewed as a criteria evaluating the set of composition methods. Such problems determine the extent success and failure of a composition The problems can be categorized into four as detailed below.
• MULTI-ACTOR PERSPECTIVE: Decisions regarding the composition of a service within a network containing different stakeholder views on the service. One stake holder may focus on user friendliness, one on scalability, one about cost.etc. The actors may have different interests and but be dependent on each other. a solution is a compromise that has to combine several contradictory goals and interests. The actors in the network will need to cooperate to a certain extent in order to realize a solution and help realize the common goal. The decision making process culminates in a set of rules to which every actor involved will comply.
• NON-FUNCTIONAL REQUIREMENTS: A common division of requirement types is by differentiating functional and non-functional requirements. Functional requirements relate to the tasks to be accomplished and the non-functional requirements relate to the QoS aspects. Composition are designed to assist users solve a problem like checking a license number. Starting a new service which is undefined the user then invokes the corresponding function in the composition. A number of compositions are used before finally getting the result by iterations and trace.
• ALTERNATIVES INSIGHT: A compositions performance is impacted when one service fails. An alternative service should be made available or specified for failures as a failsafe option. Any composition should support failure analysis and reasons for failure.
• PLANNING SUPPORT: Absence of a shared view amongst actors hinders communications and possibly block further development cycles in a composition. Lack of iterations also results in identifying limited number of alternatives and alternative compositions that can be evaluated. The composition has to provide possible services that can be changed when services are not available waiting for implementation. The composition should provide partners a plan for the realization of the composition, sine, the objectives are related to both the composition and resulting composition.
Processes should facilitate reuse, dependability and planning. The result should reuse services, contain an overview of functional and non-functional specifications and planning.
1.9 CHALLENGES IN WEB SERVICE MINING
Since computing services are becoming dependent on web services, the Services they are more complex and depend on results provided by other web services. The Quality and correctness of any web service depends upon the other services used and data mining techniques to get patters becomes in-applicable. Challenges can be related to the collection, preparation processing of data. This thesis addresses these problems with proposed solutions of other research papers. In this world of cloud computing Cloud services can be used to provide an integrated infrastructure for data mining  and systems can be analyzed. The event logs can be used to verify the behavioral properties of web service composition . Knowledge discovery may no m ore be a challenging task but combining data mining techniques on web services logs to discover knowledge from the web service can be. Proposed Web services changes over a period of time  Another challenging aspect of service mining is to build semantic relations between Web services and construct a semantic relation-based web services registry with complementary functions [74.]. Further, discovered web services needs to be accurate and reliability of social network has been used combining service mining by K optimal trust paths for the selection of trustworthy service in such a service oriented online social network [75.]. Business flows are an important mechanism to find or create enterprise web services. Understanding a business control flow is again a challenging task. Process mining has its own share of challenges in determining the scope of the process without disturbances making existing algorithms less useful.
Designing business intelligence solution with web service mining architecture is another major challenge . Each aspect of web service mining is a candidate for challenge. Composability with its syntactic operation structures, the semantics of messages, quality that determines services. Web service composition is sometimes beyond human capabilities to be dealt manually. The complexity has its causes due to the number of services available over the Web, Web service changes since they are updated easily, making it important to update the composition. Different organizations as actors to a composition do not have a unique language to define and evaluate the Web services. Though this new area is applicable to governmental pr public organization for reducing overheads and deploys solutions more quickly, it is a concept which has to be acknowledged, accepted and standardized over a longer period of time.
Web compositions incorporate information from various sources and domains to achieve a task in a Service-Oriented Architecture (SOA). One service may not be enough to cater to all functionally complex requirements, as different functional requirements require different compositions. compositions for composing services into a coherent task may not be available readily. It is difficult to find a single web service that produces the desired output from inputs. Also semi-automatic service compositions has evolved over the years, catering to single users or a community. Tools and interfaces to facilitate this kind of web composition are mashups, BPEL-tools etc. The community compositions target the knowledge produced by communities.
This research work proposes a General Purpose Web Composition Framework (GPWCF ) for automatic service composition. GPWCF is a different angle for web compositions and can be implemented by organizations. The user selects and sets constraints. The requested service is fetched from the services repository. A scope definition is constructed and determines the search space and the return of information sets to the user. When a corresponding service is found, an error message is sent to the user and trapped in the error log. All errors like Errors on execution of a service, errors in fetching of results, network failures are logged in the error log, enabling improvement of the compositions in future versions. The GPWCF can be used by organizations as a starting point for web services compositions and developed for future requirements.
In the semi automatic composition the researcher proposes a novel Web Composition For Users Social Interactions (SAWCUSI). Any social network composition should cater to an interaction between a user and services requested by him. The SAWCUSI composition offers a complete solution to user’s requests. If a user’s service request is not found, SAWCUSI, connects to other social network compositions, searches for the service’s availability and updates its services list on the service availability and services the request on receiving the service from the other framework. The error log created at run time is looked up periodically to check services that are unavailable in the frameworks list and gets transformed into a creation for a new service. The error log is checked manually and the decision of creating a new service with the help of a domain expert is taken. The framework can also be used to extract useful knowledge on these social interactions.
Web services that link applications need to address security policies in the development of web service security systems. Ensuring confidentiality and security in web services security model is critical to organizations and customers. The systems have to work on any platform by adopting a neutral language and accept popular security mechanisms. The framework needs to be extended to existing security infrastructure, while allowing web service providers and requesters to develop solutions that meet individual security requirements of applications. WS-Security defines the core facilities for protecting the integrity and confidentiality of a message by providing a model with security functions and components for Web services. It also demands solutions in technological and business perspectives. The efforts require co-ordination between vendors, developers, service providers and customers. For example, a customer making an on-line purchase should not be impacted by the instrument used in the transaction. The goal has to be building interoperable solutions in heterogeneous environments. WS-Security also describes enhancements to SOAP messaging for providing protection through integrity, confidentiality and single message authentication. If a Web service provider does not accept requests from a specific IP address, a choice needs to be given in compositions to overcome this constraint. This chapter discusses a security model with constraints that are compliant with existing standards and thus be adapted to varying business needs. It exploits a syntactic approach to model security requirements of a Web service and considers security requirements of both Web service requestors and Web services taking part in the composition.
Web services that link applications need to address security policies in the development of web service security systems. Ensuring confidentiality and security in web services security is critical to organizations and customers. It also demands solutions in technological and business perspectives. The efforts require co-ordination between vendors, developers, service providers and customers. This thesis proposes a security model called the SWS-Broker, a Web service. The SWS-Broker consists of four main components namely WF-Modeler, WSs-Locator, Security Matchmaker and WSBPELgenerator It first performs the creation of an appropriate workflow (WF), to model the business process before generating the required service and with help of libraries of business processes. The Broker generates the WSBPEL document representing a secure composition.
Web Services defined run complex applications encompassing several Web service calls and till a composition is tested it needs to be controlled manually. The semantics of Web services and compositions are rather limited. The proposal for enriching Web services are quite restrictive since several Web service calls are considered to be in one singleton Web service call and is treated independently. Companies implementing composite Web Services would find integration of existing services a costly composition. interoperability certification will again play an important role in business process integration.
The results are analogous to previous researches which may have certain variations due to advancements in current technology. Though the framework is a proposition for a faster and efficient composition, it has to be implemented in full by organizations to reap the benefits. Also, fully automatic compositions are not at a stage where the user can fully trust the composition and expect or predict the execution results.
1.11 ORGANIZATION OF THE THESIS
The rest of the document is organized as follows: In Chapter 2, describes web service mining techniques. Chapter 3 discusses Web service compositions. Chapter 4 discusses the proposed Automatic Web Service Composition Framework For User Dependent Web Mining . Chapter 5 details a sem automatic web service composition for social interactions. Chapter 6 discusses web service implementations while chapter 7 proposes a generic model for web service security with a service broker. Chapter 8 discusses the results and evaluates proposed architectures and models. The thesis concludes in chapter 9.
CHAPTER – 2
WEB SERVICE MINING TECHNIQUES
Web services are being deployed at an accelerating rate. New services are created by extending or combing existing services. Process mining is considered an extension of service mining, due to the inter dependence between business process and web services. Specific queries discovering patterns for a competitive advantage in a business environment may be unavailable. Such mining is a bottom up search process proactively targeting potentially interesting and useful Web Services from existing ones. Systems demand a sophisticated design to answer to user needs and requirements with reliable executions. It is important to track Web services utilization. Business opportunities can be discovered by analyzing different logs and tracking Web Service Interactions with autonomous parties. This chapter compares different web mining approaches for discovery, conformance and extension of examined properties and behaviors and algorithms to extract process trace data from the process logs to develop a novel . Models aim to improve processes used in business process mining. Service mining challenges and their solutions are discussed below proposed in different service mining researches.
2.2 COMPOSITE PATTERNS EXTRACTION FROM EXECUTION LOGS
The service composition re-usage of patterns provides an efficient way to improve the quality of new applications. The service pattern is identified by locating associated services commonly used by different applications and understanding control flow in the set of associated services. The application infrastructure facilitates the monitoring of services-oriented applications from execution logs. Composite service patterns constitute requirements of multiple organizations with a specific control flow and effects best practices. Re-using services pattern provides good quality composite web services. Service pattern composition can be built in two ways.
• Top-Down: The business processes from different organization are reviewed to identify patterns
• Bottom-UP: Execution logs of applications are analyzed to mine business patterns like frequently executed service patterns. This pattern mining of execution logs can be broken in three sub tasks
o Pre-processing: A service-oriented application is executed as multiple instances where each Instance is identified by a unique identifier. Different types of events of the instances are logged. Events like resource adaptor event, business rule event and service invocation event.etc. The entry and exit points of a process are logged in the application logs along with its instance identifier and time stamp. Logs are processed to filter events.
o Identifying frequently associated services: Services which occur frequently are considered for service pattern. The number of services to be analyzed is pre-defined.
o Recovering the control flow: The control flow of a service in the service pattern is reusable. The executing instance of a service is considered and execution flow is extracted and similar execution flow is extracted for all services. The Common execution flows in these services is considered
An approach called Event Calculus (EC) with a time structure to model event based interactions independent of any sequence of events was presented . The time structure facilitates interactions independent of the input events and system behavior close to EC ontology, The focus of any process mining approach is verification of the specified properties for discrepancies between the process model and related instances in the Log Based Verification. Formulating Properties is checking the footprints of system event logs to verify the authenticity of the properties. Web Service Logging is the first step in any mining process and consists of gathering relevant data from the web for analyzing useful information about Web Service behavior from two main sources data on the client and data on the server. The Web service logging facility retrieves the Web data log. Advanced logging solutions for Identifying web service can be obtained by using SOAP messages.
The Event Calculus approach specifies which properties are to be verified. Specifications are expressed as events, which occur during interactions and retrieved from the execution logs. The technique is applied to the discovered fields in process mining.
2.4 WEB SERVICE INTERACTION MINING
Business Process Execution Language (BPEL) standardizes web service compositions into business processes. BPEL defines and monitors workflows. Web Service Interaction Mining (WSIM) as an extension to BPEL. WSIM proposes three levels of abstraction on performance. The event log is mined to get information about the web service behaviors, reducing the amount of data. Information on interactions between the web services determines critical dependencies. The transactions are categorized into four types: One way, notification, response for request and solicit response. One-way and notification operations are messages between the sender and receiver. The sender knows the receiver, but the receiver does not have knowledge about the sender. Request-response and solicit response are messages exchanged by the sender and receiver. The initiator sends a message and the called service replies to the message.
2.5 LOG BASED WEB SERVICE MINING
A web service application accessible to customers is identified by a URI with its interfaces and bindings described in an XML document. The service is discovered by other web services and interactions between web services happen with SOAP, UDDI and WSDL. The execution log is mined and analyzed for Web services behavior, which contain many levels due to complexity and the richness of the composition in Web services model. In Level-1 the set of operations offered by a web services are gathered. Level-2 defines additional to set of operations and Level-3 has the set of interactions exchanged within a given choreography. Level-4 defines the composite Executable process implementing a composite Web service. The mining combines data mining techniques on web services log to discover knowledge from the web service model . The first step is gathering data for analysis and gets useful information about web services behavior. The logging is done at two levels Trivial and Advanced.
2.6 WEB SERVICES USAGE MINING
Learning the usage sequence of users from web services can give important information. The information can be used to find better web services by applying web mining techniques to analyze patterns in behavior of web services. Frequent occurrence of a web services could be mined using AprioriAll algorithm. Optimizations for faster timing in the execution of the mining algorithm can be done. The correlations between operations and web services can be discovered using a proper log format.
The log must contain start time, connection time, disconnection time, session-id, user-id, service with operations. The log sequential pattern is fed as the input with a defined threshold (occurrence count a web service). The k-item set of operation sets are generated and by reducing size of the candidate set the speed is improved. For Example filtering unrelated services log from the data set.
2.7 PROCESS MINING
Process mining is the discovery or verification on the conformance of processes based on event logs, when Web services are distributed amongst different parties. Process mining helps monitoring the exact execution of processes and determine bottlenecks, unused paths and verify deviations. The sequence of events is recorded for every process instance called a trace. The event log contains a set of process instances with various properties of processes associated. When the events are not correlated to the process instances, the execution/ monitoring of processes and Key Performance Indicators measurements go wrong. Analysis of Process Mining is useful when the web services are distributed over autonomous parties and they show an emerging behavior. The mining can be of three types, Discovery when no prior model exists and a new model needs to be constructed using event logs. Conformance is done when a prior model exists. The third type of Extension happens when a prior model exists, but is extended with new perspective for enriching the model. IBM Web Sphere Business Monitor satisfies the three types of process mining and thus Process mining techniques can be used to discovery, confirm and extend the existing web services.
2.8 CLUSTERING EVEN LOGS FOR PROCESS MINING
Process Mining can be used to discover, monitor and improve real processes by extracting knowledge from event logs. Though Algorithms perform well on structured processes it is not easy to determine the scope of the process. Event logs can be clustered iteratively to form a set of similar cases and be adequately represented in a process model. The observed executions of a cluster can be used to discover new models or check conformance to a model. The process represented as Petri-Nets i.e. using two types of nodes namely places and transition. Places indicate states and transitions represent actions. The logs can be iteratively split into clusters until a precise model is formed maintaining the partitions to a minimum. The clusters are split into smaller clusters using k-means method and by finding centroids over which a set of vectors are clustered. The relevance is based on frequency of occurrence in the log.
2.9 WEB SERVICES DOCUMENT BY CLUSTERING WSDL DOCUMENT
The WSDL document has six major components types, portType, messages, binding and service. WSDL documents when mined display features that describe the semantic and behavior of the services. Integrating the features together a cluster of web services could be derived. Search engines can be used to discover and extract components WSDL documents. A Query on the clustered web services returns semantically relevant web services.
A. Extraction : WSDL is parsed to produce tokens of the content
B. Word Stemming: Base words are extracted from the vector created in step A
C. Function word removal: The function words are identified using Poisson distribution and removed from the vector.
D. Content word recognition: k-mean algorithm is used for most frequently occurring words like ‘data‘, ‘web‘, ‘port‘ identified and removed.
E. Extraction of WSDL types like complexType.
F. Messages of web services from WSDL extracted.
G. WSDL portType extracted representing the combination and sequence of message operation.
Quality Threshold clustering algorithm is used to extract features and integrated. Two criteria are used to evaluate performance, Precision and Recall. Precision measures the correctness and Recall measures completeness. The web services are placed into the clusters .
2.10 COMMON LOG FORMATS
2.10.1 NCSA Common (access log)
The NCSA Common log format contains only basic HTTP access information. The NCSA Common Log, sometimes referred to as the Access Log, is the first of three logs in the NCSA Separate log format.
The Common log format can also be thought of as the NCSA Combined log format without the referral and user agent. The Common log contains the requested resource and a few other pieces of information, but does not contain referral, user agent, or cookie information. The information is contained in a single file. The fields in the Common log file format are host rfc931, username. date:time request and statuscode in bytes. For Example
18.104.22.168 – username [02/Apr/2013:21:15:05 +0500] “GET /index.html HTTP/1.0” 200 1043
Description of the fields in the Common log format:
• Host: The IP address or host/subdomain name of the HTTP client that made the HTTP resource request (“22.214.171.124”).
• rfc931 The identifier used to identify the client making the HTTP request and If no value is present, a “-” is substituted.( “-“)
• username: The username, (or user ID) used by the client for authentication. If no value is present, a “-” is substituted.
o date:time timezone The date and time stamp of the HTTP request. (“02/Apr/2013:21:15:05 +0500” where the fields are dd is the day of the month, MMM is the month, yyyy is the year, :hh is the hour, :mm is the minute, :ss is the seconds and +-hhmm is the time zone
• request The HTTP request with the requested resource, the HTTP method and the protocol version. ((“GET /index.html HTTP/1.0”)
• statuscode: The status is the numeric code indicating the success or failure of the HTTP request.(“ 200”).
• Bytes: The bytes numeric field containing the number of bytes of data transferred as part of the HTTP request, not including the HTTP header “1043”.
2.10.2 NCSA COMBINED LOG FORMAT
The NCSA Combined log format is an extension of the NCSA Common log format. The Combined format HAS the same information as the Common log format plus three (optional) additional fields: the referral field, the user_agent field, and the cookie field. The following are the fields in Combined log format host, rfc931, username, date:time, request statuscode, bytes, referrer, user_agent and cookie. For Example :
126.96.36.199 – dsmith [02/Apr/2013:21:15:05 +0500] “GET /index.html HTTP/1.0” 200 1043 “http://www.ibm.com/” “Mozilla/4.05 [en] (WinNT; I)” “USERID=CustomerA;IMPID=01234”
The following are descriptions of the three additional fields:
referrer: http://www.ibm.com/ is the URL linking the user to the website.
user_agent: “Mozilla/4.05 [en] (WinNT; I)” is the Web browser and platform used by the user to visit the website.
Cookies: “USERID=CustomerA;IMPID=01234” are the pieces of information that the
HTTP server sends back to client along the with the requested resources. A client’s browser stores this information and subsequently sends it back to the HTTP server upon making additional resource requests. A HTTP server can establish multiple cookies per HTTP request.Cookies take the form KEY = VALUE. Multiple cookie key-value pairs are delineated by semicolons(;).
2.10.3 NCSA SEPARATE (THREE-LOG FORMAT)
The NCSA Separate log format, sometimes called three-log format, refers to a log format in which the information gathered is separated into three separate files (or logs), rather than a single file. The three logs are often referred to as Common log or access log, Referral log and Agent log The three-log format contains the basic information in the NCSA Common log format in one file, and referral and user agent information in subsequent files. However, no cookie information is recorded in this log format.
• Common or access log: The first of the three logs is Common log, sometimes referred to as the access log, which is identical in format and syntax to the NCSA Common log format.
• Referral log: The referral log is the second of the three logs. The referral log contains a corresponding entry for each entry in the common log.
The fields in the Referral log are date:time and referrer. For Example
02/Apr/2013:21:15:05 +0500] “http://www.ibm.com/index.html”
The following is a description of the fields in the Referral log:
• date:time timezone: 02/Apr/2013:21:15:05 +0500 is the date and time stamp of HTTP request. The date and time of an entry logged in the referral log corresponds to the resource access entry in the common log. As a result, the date and time of corresponding records from each of these logs will be the same. The syntax of the date stamp is identical to the date stamp in the common log.
• referrer: “http://www.ibm.com/index.html” is the referrer is the URL of the HTTP resource that referred the user to the resource requested. For example, if a user is browsing a Web page such as http://www.ibm.com/index.html and the user clicks on a link to a secondary page, then the initial page has referred the user to the secondary page. The entry in the referral log for the secondary page will list the URL of the first page (http://www.ibm.com/index.html) as its referral.
• Agent log: The Agent log is the third of the three logs making up the three-log format. Like the referral log, the agent log contains a corresponding entry for each entry in the common log. The fields in the Agent log are date:time and agent. For Example
02/Apr/2013:21:15:05 +0500] “Microsoft Internet Explorer – 5.0”
The following is a description of the fields in the Agent log:
• Date:time timezone : [02/Apr/2013:21:15:05 +0500 is the date and time stamp of HTTP request. The date and time of an entry logged in the agent log corresponds to the resource access entry in the common log. Because information logged in the agent log supplements information logged in the common log, the date and time of corresponding records from each of these logs will be the same. The syntax of the date stamp is identical to the date stamp in the Common log.
• Agent “Microsoft Internet Explorer – 5.0” is the customary HTTP client request for the Web browser, to identify itself by name when making an HTTP request. It is not required, but most HTTP clients do identify themselves by name. The Web server writes this name in the agent log
2.10.4 W3C Extended Log Format
This log file format is used by used by Microsoft Internet Information Server (IIS). A log file in the extended format contains a sequence of lines containing ASCII characters. Each line may contain either a directive or an entry. Entries consist of a sequence of fields relating to a single HTTP transaction. Fields are separated by white space. If a field is unused in a particular entry dash “-” marks the omitted field. Directives record information about the logging process itself. Lines beginning with the # character contain directives. The following directives are defined in the W3C Extended format:
• Fields: [
• Software: string : Identifies the software which generated the log.
...(download the rest of the essay above)