Abstract’ In an extremely wide network, data aggregation reduces the amount of communication and energy usage significantly. Researchers have proposed a new aggregation framework called synopsis diffusion, which is very robust and also it combines multipath routing schemes with duplicate insensitive methods which accurately calculate the data aggregates even in situations of message losses, resulting from node and transmission failures. But, this framework does not solve the problem of false sub aggregate values which is contributed by compromised nodes and finally resulting in large errors in the aggregate calculated at the base-station, which acts as the root node in the aggregation hierarchy.
Keywords’ Base station, data aggregation, hierarchical aggregation, in-network aggregation, sensor network security.
This is an important problem since sensor networks are highly susceptible to node compromises due to the behavior of sensor nodes and the lack of tamper- resistant hardware. In our study, we make the synopsis diffusion approach secure against attacks in which compromise nodes contribute false sub aggregate values. In particular, we present a novel lightweight verification algorithm in which we make base station to determine if the computed aggregate (predicate Count or Sum) includes false contribution. Thorough theoretical analysis and extensive simulation study show that our algorithm outperforms other existing approaches. Irrespective of the network size, the per- node communication overhead in our algorithm is Index Terms’Base station, data aggregation, hierarchical aggregation, in-network aggregation, sensor network security, synopsis diffusion. Recently, the research community has proposed a robust aggregation framework called synopsis diffusion which combines multipath routing schemes with duplicate- insensitive algorithms to accurately compute aggregates (e.g., predicate Count, Sum) in spite of message losses resulting from node and transmission failures. However, this aggregation framework does not address the problem of false sub aggregate values contributed by compromised nodes resulting in large errors in the aggregate compute the base station, which is the root node in the aggregation hierarchy.
The proposed System will achieve the following objectives
‘ To know the current state of knowledge in a research area.
‘ What is known/generally accepted’? To summarize and synthesize Project is related to previous research.
II. ASSUMPTIONS, THREAT MODEL, AND PROBLEM STATEMENT
A. System Assumptions
We assume that the sensor nodes form a multi hop network with BS as the central point of control. We also assume that sensor nodes are similar to the current generation of sensor nodes, e.g. MicaZ or Telosmotes, in their computational and communication capabilities and power resources, while BS is a laptop class device supplied with long-lasting power.
B. Threat Model
The synopsis diffusion framework on its own does not include any provisions for security. Consequently, it is subject to various attacks from unauthorized or compromised nodes.
To stop unauthorized nodes from interfering in (or
eavesdropping on) communications among honest nodes, we can extend the aggregation framework with standard authentication and encryption protocols. So, we do not see any need to consider the attacks coming from unauthorized nodes in the rest of this paper.
C. Problem Description
Our goal is to detect the falsified sub aggregate attack against Count or Sum algorithm. More formally, our goal is to detect if, the synopsis received at BS is the same as the ‘true’ final synopsis .Without loss of generally, we present our algorithm in the context of Sum aggregate. As Count is a special case of Sum, where each node reports a unit value, this algorithm is readily applicable to Count aggregate also.
Fig. 1. Example of falsified sub aggregate attack
III. VERIFIABLE AGGREGATION FOR SECURE DATA
‘ receive from child nodes;
‘ aggregate received synopses with local one the index of the th rightmost ‘1’ bit in , where is the largest such integer not higher than ;
‘ may have fewer than ‘1’ bits where . generate one
‘ MAC for bit for ;
‘ construct the union of the received MACs and the self-generated ones; randomly select from ; broadcast to parents
‘ Finally, after receiving the messages from its child nodes, BS computes the final synopsis and verifies the received MACs. If it has received one valid MAC for each of the rightmost ‘1’s present in , the verification succeeds and is accepted.
‘ Otherwise, the verification fails. We note that to reduce the message size, a source node generates one single MAC to authenticate all of the bits to which it contributes, say, bit and bit. However, to help the exposition, our illustrations list these MACs separately as and with. Node is in ring and nodes, and are in ring, and send to their fused synopses, and , respectively. Node also forwards one MAC each for the 4th, 5th, 6th,8th and 10th bit, which are denoted as , , , , and , respectively. Similarly,
Fig. 2.Aggregation phase of verification algorithm. An example
Fig. 3. Example of MAC forging during aggregation phase
D. Protocol Analysis and Comparison
Here, we analyze the performance and the security issues of our verification algorithm and compare them with other algorithms .
Outage Consideration in Real Application:
To the best of our knowledge, only three other verification algorithms have been proposed: (1) in ; (ii) in ; and (iii) in . To make a fair comparison, for ’s algorithm we consider only the verification phase. Table II compares these four algorithms as the first four entries. We note that a few researchers proposed attack-resilient algorithms which attempt to solve a more difficult problem than aggregate verification at the cost of more communication overhead and latency. We report the performance of these algorithms as the last two entries in Table II. However, in the rightmost column of the table, we clearly indicate that they are not verification algorithms by saying ‘NA’ (not applicable). Now we discuss all entries for each of the considered features.
E. Latency :
Our protocol completes within one epoch3 simultaneously with the original synopsis diffusion algorithm. Chan et al.’s algorithm  takes two epochs, while Yang et al.’s  and Garofalakis et al.’s  algorithms take one epoch each. The worst case latency incurring in  is , where is the upper bound of Sum and is the size of the sliding window used. Note that if the upper bound of Sum is large, then  can incur high latency. The sampling-based protocol  takes epochs to complete, where is the network size.
F. Communication Overhead:
In our protocol each node has to forward at most MACs for each synopsis. If synopses are 3As defined in the prior work , an epoch represents the amount of time a message takes to reach BS from the farthest node on the aggregation hierarchy.
IV. COMPARING OUR VERIFICATION ALGORITHM WITH OTHERS
A. Computation Overhead
During our protocol a node has to compute at most one MAC (which is as hard as computing a hash function) for the whole set of synopses. However, to compute synopses a node has to compute hash functions ,  where is ‘s sensed value. Considine et al.  proposed some methods to reduce this overhead. Garofalakis et al.’s algorithm  as well as our prior work  have same complexity as above. On the other hand, Chan et al.’s algorithm  incurs hash computations per node, while Yang et al.’s algorithm  and sampling-based protocol  incur hash computations per node.
B. Approximation Error
Our current verification algorithm, the algorithm in , and Garofalakis et al.’s algorithm produce an approximate estimate of the aggregate, where the amount of error is reduced if the number of synopses used, , is increased. On the other hand, Chan et al.’s and Yang et al.’s algorithms return the exact estimate if no message is lost. The algorithms in  produce an – approximate estimate.Robustness to Message Loss: Our algorithm and Garofalakis et al.’s algorithm arerobust because they use multipath routing. In contrast, Chan et al.’s algorithm is very sensitive to communication loss, and for the verification to succeed BS has to receive the authentication message from every node. As nodes construct an aggregation tree, communication loss over any edge may paralyze this algorithm. As a tree-based topology is used for message routing, Yang et al.’s algorithm is also not robust. The algorithms in  or  are robust against loss because they use multipath routing schemes.
Theoretically, there is a chance that our algorithm may not detect the falsified sub aggregate attack, but we can make that probability approximately 0 by properly choosing (Claim 5.5). Furthermore, if the attacker does succeed to stealthily inject some ‘1’s in a synopsis, we have a further level of defense. In fact, while for ease of exposition we presented the protocol to compute just one synopsis, multiple synopses are computed in practice. The values of these synopses are highly correlated . So, if the value of one synopsis appears to be an outlier compared to the others, that synopsis can be rejected. Chan et al.’s algorithm and Garofalakis et al.’s algorithm deterministically detect the falsified subaggregate attack, which is an advantage over our algorithm in the absolute term. On the other hand, Yang et al.’s algorithm achieves probabilistic detection.
Garofalakis et al.  proposed to also compute the complementary aggregate to limit the undetected error injected by a deflation attack. We can readily adapt their technique to ensure that this error is where is the upper bound of Sum and is the approximation error of the synopsis scheme. We note that if is the upper bound on number of nodes and is the upper bound of any node’s sensed value, then. Say one run of the aggregation algorithm returns the Sum as and the node Count as the average sensed value (in this run) so, the relative error if this ratio is small in a specific application, this technique ensures that the damage done by a deflation attack is limited. Further note that in Section IV-C we already explained that this attack is very unlikely to occur in the first place in our problem setting. Further, we can consider number of stored keys in each node as another performance metric. For  each node has to store symmetric keys which are shared with the base station. On the other hand, for other protocols including ours, each node stores keys.
...(download the rest of the essay above)