Undependable or unreliable networks Integration solution should be designed
in such a way that there are no data loss in a failing network. It should be capable
of handling the worst case scenarios of network failure.
No fast networks Expensive network calls will result in performance constraints.
Foreign applications Two foreign applications which don’t speak or exchange data
in common tongue has to be mentored. They might need a translator.
Change The constant changes in the application should be handled. Loose coupling
with the other systems and interfaces are fore seen to avoid complications. [12]
5.1 Integration Methods
This section describes possible integration methods for the above problems.
File Transfer
Applications share a common file. In other words, they read and write file in common
location. Data are transferred from one application to another using the shared file.
Specifications of file location, access, authorization and other information of the file are
provided among applications in anticipation. Non-repudiation and data integrity are
some of the serious problems of this method. [12]
5.2 Messaging System 33
Shared Database
A single database is read and written by multiple applications. No duplication of the
database is realized. Since the physical location of the database is same for all applications,
there is no necessity of data transfer. Even though, data integrity is highly
managed by using a single source, performance issues are visualized as the source is
accessible by multiple applications. [12]
Remote Procedure Invocation
Some features or interfaces of an application are being shared to other applications.
Thereby, paving a way to call the shared features remotely as remote procedure. Real
time and synchronous communications are possible. However, expensive remote calls
will result in network traffics and performance issues. [12]
Messaging
Data are published in the form of messages to commonly known messaging channel.
Applications may have read and write access in the respective channels of data transfer.
Messages can also be read asynchronously from the channels at a later point of time.
The channels and other specifications are shared in anticipation. [12]
5.2 Messaging System
An integration solution that provides the messaging capabilities are called as messaging
system or sometimes also as message-oriented middleware (MOM). It is loosely coupled
with the sender and receiver applications. They are connected using TCP/IP protocol.
Messaging system provides reliable transfer of data from sender to receiver. [12]
As described in the figure 5.1, after the message is created by the sender, it is stored
in a local message storage. Later, the messages are sent through the network. Similar
to sending side, receiving end has a storage which stores all received messages. In case
of network failure, the storage will help in retrieving messages. Receiver processes the
message obtained. This is a typical functionality of a simple messaging system. More
sophisticated features are introduced if in case of needing quality attributes such as
security, data integrity, etc. [12]
34 Chapter 5. Integration Solutions
Figure 5.1: A simple Messaging System
5.2.1 Why Messaging System?
In brief, messaging system solves most of the problems faced by an integration solution.
On network failures, the messages are stored in the message storage and can be retrieved
in a later point of time asynchronously. Expensive remote calls are avoided. Synchronous
call makes the thread to wait for a long time until it gets back response. Unlike here,
messaging system works asynchronously. This makes the thread not to wait and enable
it to work in parallel. Finally, it is adaptable to changing environments as it usually
platform independent and mainly it is loosely coupled. [12]
5.2 Messaging System 35
5.2.2 Messaging System Protocols
5.2.2.1 Messaging Channels
Messaging channel is a medium where the messages are transmitted across networks.
Communication among the senders and receivers are linked by the messaging channel. It
can be compared to telephone lines, where two different speakers are connected together
with the help of telecommunication channels. Every specification related to data like
data type of the message being transferred, etc are configured precisely. Sender know
what data it is going to send, in what format, to which receivers and so on. Every single
specification can be managed by an administrator. [12]
The figure 5.2 outlines a simple messaging system and its inner messaging channel by
which communications are processed. A messaging channel will not accept random data
from random sender. It follows a set of strict rules in the configuration process. Details
related to the message transfer such as sender, receiver, data type and time of expiry of
the messages are configured in advance. Messaging channel is structured that the sender
will not be having the knowledge of receivers. Sender will just submit the messages to
system. The system is responsible for delivering to the respective receivers. This loose
coupling enables any type of application to send data. On the other side, receiver will
have no authorization to pull data from random channels. Every channel is named and it
is assigned to respective senders and receivers in order to avoid conflicts and issues. [12]
Message channels are basically divided into two types such as Point-to-Point channel
(5.2.2.1) and Publish-Subscribe channel(5.2.2.1). Apart from these main types, there
exists Datatype Channel, Invalid Message Channel, Selective Consumer Channel, etc.
Some of these are explained below. More detailed explanations can be found in the
book [12, Enterprise Integration Patterns by G. Hohpe and B. Woolf].
Point-to-Point Channel
Limitation of one receiver to a channel is called Point-to-Point channel. In other words,
there shall be only one receiver for a sent message. No other receiver will be able to get
those messages. If there are more than one receivers at the receiver’s end, Point-to-Point
channel make sure that only one gets those messages. [12]
For example, in a bank transaction, sender sends money to a particular person, who is
authorized to get. This outlines the point-to-point channel.
Publish-Subscribe Channel
This is derived from Observer pattern. All available receivers are published with the
same message by a sender. So, every receiver who is available gets exactly the same
36 Chapter 5. Integration Solutions
Figure 5.2: Messaging Channel
message. The single messaging channel for all subscribers is further divided into one for
each subscriber so that each subscriber gets the message copy. On the event of message
consumption, the copy vanishes from the channel. [12]
An example for Publish-Subscribe is a common internet news feed. Every receiver who
is subscribed to those feeds will constantly receive news updates from the sender. For
further reading, more information about the Oberser pattern can be read in the book
"Design Patterns: Elements of Reusable Object-oriented Software". [13, Ralph et al.,
1995]
5.2 Messaging System 37
Figure 5.3: Point-to-Point Messaging Channel
Figure 5.4: Publish-Subscribe Messaging Channel
Datatype Channel
The receiver’s knowledge of the message, it’s structure and type are usually zero. For
example, a message can be of bytes, XML, characters and so on. In order to avoid being
unconscious, Datatype channel provides a way that the receiver is aware of what data
type of message that it will receive. This is configured much before the message transfer.
One big advantage of this is, the receivers are prepared to process what it will receive.
Decrease in processing time adds more value. The message header contains information
about the message type and other specifications.
5.2.2.2 Messages
Message is a packet which is composed of a part or whole data that are supposed to
be transmitted to the receiver. After receiving, the message is extracted and the data
are processed. It can transmit data of almost any type. Message expiry time can be
adjusted in messaging server to keep the message in the messaging server for a required
38 Chapter 5. Integration Solutions
Figure 5.5: Datatype Channel
time period.
5.2.2.3 Transform
Two application who don’t speak in a common tongue or common data type, find difficulty
in exchange of data. To resolve this, the sender’s message is transformed to be
readable by the receiver. There are Message Translators which are capable of doing
transformations. These translators come in handy sometimes within the messaging system
and transforms messages to the desired type on the fly before sending it to the
receiver.
5.2.2.4 Endpoints
Endpoints are the connecting nodes of a messaging system. It can be a data adapter
or an application, etc. Application sends messages through these Endpoints. Endpoints
are linked through the ports and normally be of TCP port.
5.3 Apache Kafka
Apache Kakfa is one of the fastest messaging system. It is defined as below in the Apache
documentation.
"Kafka is a distributed, partitioned, replicated commit log service. It provides
the functionality of a messaging system, but with a unique design" [11,
Apache Kafka 0.9.0 Documentation]
Kafka categorizes the messages to be sent as topics, so that each topic has a set of messages.
Each node in the kafka cluster which is responsible for message transfer is called
5.3 Apache Kafka 39
Broker. Producer is the one who publishes messages. On the other hand, subscribers are
the consumers of those messages and they listen to particular topics. A typical Kafka
messaging system looks like the figure 5.6. Producers and consumers have no direct
contact. In fact, producers need not to be aware of the consumer’s location. The man in
the middle called broker does all the work of connecting consumers and producers. [11]
Figure 5.6: Kafka system
Zookeeper
The Kafka system is comprised of two important subsystem which are Zookeeper and
Message broker. When an application is distributed, management among the distributions,
synchronization and providing services are complex tasks. They are prone to
bugs and bugs are unavoidable. To fix these issues, Zookeeper is in place to manage
the distributed applications. Zookeeper’s interface coordinates by providing centralized
services. Group management and the interactions among the distributed brokers are
handled automatically by the Zookeeper. Zookeeper can also be distributed into many
nodes for higher manageability and high performance. It is advised to distribute the
Zookeeper with minimum of three nodes in a distributed environment. In distributed
Zookeeper nodes, a leader among the Zookeeper is elected automatically and can be
changed if required. If the leader is down, a new leader arise automatically. As the information
is shared among the nodes, synchronization is not an issue. A client interacts
with only one Zookeeper. However, as explained before, as information are shared among
the other Zookeeper nodes, every node has data about the client. [14] [15]
Zookeeper cluster with three nodes are depicted in the figure 5.7. One of them acts as
a leader. A client is connected to one of Zookeeper. All Zookeepers are linked among
themselves for synchronization. A leader is elected and all are interlinked through ports.
40 Chapter 5. Integration Solutions
Figure 5.7: Zookeeper cluster
Broker
Message broker is managed by the Zookeeper. Broker is a system which is involved in
the transmission of messages from producers to consumers. It can also be distributed
in several nodes. Similar to Zookeeper, Broker also has a leader and followers in the
distributed environment. On the failure of a leader, a new leader is automatically chosen
from the followers’ pool. Producers sends messages with a significant topic. Consumers
who listens to those topic, will receive the messages. Broker works behind the scenes in
data transfer.