Abstract— Mobile Adhoc Network(MANET) is characterized by constant change that forms connected and disconnected MANETs .Due to the challenges of MANETs ,existing social network file sharing it makes file transfer ineffective for certain applications. Most of file sharing is made through smart phones, laptops and PCs from one system to another. This paper explains sharing of audio file, text files efficiently over the network. Existing P2P content based file sharing system namely Social network-based P2P cOntent-based file sharing in disconnected mObile ad hoc Networks (SPOON), for disconnected MANETs was used widely. For efficient file searching, SPOON groups the common-interest of the nodes that frequently meet with each other as communities. This system uses an interest extraction algorithm to derive node’s interests from its files for content-based file searching. The proposed work is based on social network based file sharing, the objective is to allow users to download media files such as music, movies, and games using a using twitter dataset analysis. Clustering technique is also used to cluster large datasets. Ant colony algorithm(ACO)is the solution to solve optimisation problems in real world.Ants are the agents that move along between nodes in a graph to find the shortest path. This improves the Accuracy, Processing time, Processing cost and file search efficiency.
Index Terms—MANETs, P2P content-based file sharing, social data Analysis, ACO technique, K-means Clustering
Department of Computer Science & Engineering,
In Past years, the usage of mobile phones ,PCs and Laptop user’s were very less when compared to current technologies. Statistics estimated roughly that 1.91 billion user’s are using smart devices. It is expected that the user’s will increase another 12% in 2016.This increase in size of about 2.5 billion by 2019.People are interested in sharing files over the network rather than other popular media files like music,movies,games . To interact with different user’s Peer to Peer(P2P) file sharing is used. P2P file sharing model used in large-scale networks to transfer files without a centralized server. It enables sharing of computer resources such as files by a direct exchange between end-users computers. Online privacy fact statistics of P2P network touched 98.8% by transferring file to the corresponding peers. The bitTorrent and uTorrent user’s are the some of the file sharing networks. The monthly user’s of these network touched 150 million. With the mobile devices of the people file sharing is made pervasive that is related to social relationship, We focus on P2P file sharing in a disconnected Mobile Adhoc Network(MANETs) consisting of mobile users with social network properties to exchange files such as audios, short videos, clips of different categories. The former methods in P2P in MANETs are flooding based, advertisement based and social contact based. Flooding and advertisement provides high overhead due to low file search efficiency. Social network contact based method provides high file search efficiency with less overhead.
Earlier P2P network uses Napster, EDonkey both are first generation of P2P, it is central server based model. Later this network produced Pay of Service so it is the one of the limitation in 1G and Second generation is the Gnutella it works without the central server. User can download all files other than Music and games, BitTorrent and uTorrent networks are used to download as well as upload files on the websites.
A. P2P Architecture
P2P file sharing is considered as efficient scheme for sharing all types of files. User acts as server in retrieving and delivering their own contents on the web. Here the peer represents node/user they can retrieve files on the internet. It is not centralized any peer can access content at any time. Requested peer receive content based on First Come First Serve(FCFS).Hence, traffic in the network will be reduced.
Fig 1.Peer to Peer Architecture
B. Components of SPOON
Social Network based file sharing are based on Components of SPOON. This social network based file sharing in P2P includes Interest Extraction, Exploiting node stability, Community Construction and Interest oriented routing. Extracting the files based on the interest of the user. The interests of the queries are based on the content based files and the components of the SPOON.
Fig 2. Components of SPOON
Common interests among the nodes are derived using community Construction algorithm. According to the frequent query of the user, the community is constructed and files are retrieved.
Comparative analysis of existing system
S.no Methodology used
Solution Problem identified
1 Interest extraction algorithm
Derive a node’s interests from its files for content based file searching High overhead
2 Hashline Technique Redundancy eliminated
Work load distribution & routing
3 Latent dirichlet’s allocation algorithm Discovers life styles of users from user-centric sensor data Lower accuracy.
4 SI based file replication Maintanence used
High cost for replicas
Replicas are shared with SI
6 Content synopses routing is used
Provides high probablity in delivering contents.
Reliability problem occurs
7 7DS,FIS Querying used
Predicting Data Availability is the challenge.
8 Fuzzy boolean transformation
Clustering technique used.
Slow for large datasets.
9 Delay Tolerant Broadcasting technique used Reduces Expected time updates
No trust for incoming entries of user’s.
10 Social oriented and Reference policies mentioned Performance improved
End to End connectivity is not proper.
11 P/S scheme with DTN used Social network properties are utilized dynamically.
Content based service for static network costs more.
II. PROPOSED WORK
The flow of the existing SPOON has the following modules as like the proposed one and this implementation can be improved using ACO optimization is given below in Fig 3.This flow represents the peer activities in the social network.
• Interest Extraction
• Community Construction
• Interest-Oriented File Share
• File upload
This flow diragram represents the activity of the social network user by interacting various activities like Interest extraction for extracting user interest,Community construction is for grouping the members belongs to the same interest category.
Fig 3.Flow Diagram
The user related files can be shared between the different peers which is represented by Interest Oriented File share.Then the interest oriented files can be uploaded and can be dowloaded using different peers.
To authenticate the particular user credentials. This Login is used. If the user is a new user he/she has to register and login to the page.
Number of users in the network requests will have different way of interest. According to the interest, the files are retrieved in the network. To derive its interests, a node infers keywords from each of its files using the document clustering technique.
Input: User query.
1. To derive node interest, node infer keyword
2. File vectors calculated by,(t1,wit1;t2,wit2,…tm,witm) (1≤k≤m)
3. Text retrieval based on keyword calculated using,
4. If m keywords are there normalize the weights
5. Calculate similarity of two file vectors
ηtk refers number of occurrences of keyword.
witk weight of the keyword.
w1k,w2k represents weight of the kth common keyword.
v1, v2 are vectors.
Fig 4.Interest Extraction
The interest of the registered user is taken as the interest of the user and if it is matched with the database of the existing user. That is retrieved by the user by reducing the delay.
People are more likely to share information if they can benefit from the sharing or if they think the information is of interest to others, there might exist community structures where users can share information more often are grouped together. From social network people with same interest are constructed as community.
Fig 5.Community Construction
Community constructed in decentralized manner by collecting node interest and frequency from all nodes. This ensures adaptivity in social network. According to the weight of the keyword of interest group, the files are clustered.
Interest-Oriented File Share
In social networks, people usually have a few file interests and their file visit pattern generally follows a certain distribution. Also, people with the same interest tend to contact each other frequently. Thus, interests can be a good guidance for file searching.
Fig 6.Interest Oriented File Share
Intra community file searching:
1. Query represented using query vector.
2. Each query associated with hop count.
3. Vector similarity is calculated based on formula.
VQ Query vector,
NC community coordinator,
λ Counter value
Inter community file searching:
Similar to Intra community file searching, upon receiving request, coordinator checks the file with the index. If not, coordinator looks up for ambassador to forward. If file exists, the data holder will send file to coordinator.
User can upload their files according to the similarity of the data. If two user have the same metrics like interest then that user can share files over the network.
Fig 6. File Upload
MY SQL YOG DATABASE:
SQLyog is the GUI tool for Relational Database MySql,This sql needs the root and the password to startup the process. Peer has to login to the system and post the interests that interest will be taken to the database that database is MySQLYog
Fig 7.MySQL Database
Ant Colony Optimization(ACO)
ACO is a probabilistic technique finding better paths through graphs. Artificial \'ants\' simulation agents locate optimal solutions by moving through a parameter space representing all possible solutions.
Similar to the movement of the ants the real world objects are grouped by picking and dropping the items/objects (ie.,Queries).Decisions are taken based on the neighborhood selection item. If the probability of the selection is increased, the probability of picking item will be calculated. This algorithm is more reliable, robust, and scalable than other conventional routing algorithms.
Fig 8.Algorithm for ACO
Here, considering the relation among node movement pattern, individual’s common interests and their contact frequencies can route file requests to file holders based on nodes’ frequencies of meeting different interests. Then, the interest-oriented file searching scheme has two steps: intra community and intercommunity searching. A node first searches files in its home community. If the coordinator finds that the home community cannot satisfy a request, it launches the intercommunity searching and forwards the request to an ambassador that will travel to the foreign community that matches the request’s interest. This includes modules listed below,
• View Users/Followers
• Reply Network
• Retweet Network
• Retweet Contents
• Content Virality
The Flow is given in Fig 9 as follows,
Fig 9. ACO Flow Diagram
III. TWITTER DATASET
Here, the twitter dataset is taken for analyzing the social network. This twitter dataset has been built after monitoring the spreading processes on Twitter before, during and after the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. The messages posted in Twitter about this discovery between 1st and 7th July 2012 are considered. This dataset have been taken for further anlaysis.
The four directional networks made available here have been extracted from user activities in Twitter as:
1. re-tweeting (retweet network)
2. replying (reply network) to existing tweets
3. mentioning (mention network) other users
4. friends/followers social relationships among user involved in the above activities
5. information about activity on Twitter during the discovery of Higgs boson
It is worth remarking that the user IDs have been anonimized, and the same user ID is used for all networks. This choice allows to use the Higgs dataset in studies about large-scale interconnected multilayer networks, where one layer accounts for the social structure and three layers encode different types of user dynamics.
P2P Social (Twitter) Data Analysis:
Preprocessing is done to remove missing values.
Fig 10.Pre processing
This follower network is for to the user’s and can view the details of the user replied to particular network.
Fig 11.Users/Followers Network
In reply network, the user and their number of occurrence of the tweet contents were displayed.
Fig 12.Reply Network
Retweet network represents the number of users and their content that can be shared to one particular user and also shared by peers.
Fig 13.Retweet Network
Retweeting the content represents the tweet contents that can be shared by nearby user’s.
Fig 14.Retweet Content
k-means Clustering is used to cluster the user ID and their content, Here there are 3,00,000 records are there it is insufficient to cluster. So the range taken randomly as 1 to 1000 and the largest and smallest values are calculated. Here the cluster is taken as 2 and the data of the user are clustered.
Fig 15.Content Virality
Fig 16.Most Virality
Virality represents the tweet topic that is most visited by number of peers.
IV. PERFORMANCE GRAPH
The experimental results suggest that interest identification by ACO as a set of keywords works fairly well, using either of the investigated similarity measures. In the present experiment a recently proposed distribution of terms associated with a keyword clearly gives best results, but computation of the distribution is relatively expensive. The reason for this is the fact that co-occurrence of terms is (implicitly) taken into account.
Fig 17.Processing Time
This Fig17 represents the number of users/nodes request and the content of the time is highlighed in graph.
Accuracy in Fig18 represents the successful delivery of the content that user viewed.
Fig 19. Processing Cost
Processing cost in Fig19 refers to the cost of the task that user processing it is also called user generated data.
This study has shown that fairly simple techniques can achieve very high quality results, but that substantial work is needed to reduce the errors to manageable numbers. Fortunately, that the problem focuses on Broadcast News and not on arbitrary forms of information means that there is hope that more carefully crafted approaches can improve the tracking results substantially.
COMPARISION OF EXISTING POD NET & K-MEANS, ACO
The test results are shown in TABLE 2, TABLE 3, TABLE 4. Improvement strategy are compared with information available in the PODNET which represents the registered user in the network. K-means and ACO provides optimization results.
TABLE 2. Acurracy
x103 1000 1500 2000 2500 3000
PodNet 4 5 7.6 9.8 12
Kmeans+ACO 2.5 4 6 8 10
TABLE 3. Processing Time
x103 1000 1500 2000 2500 3000
PodNet 3 5 8 10 12
Kmeans+ACO 2 4 6 8 10
TABLE 4. Processing Cost
x103 1000 1500 2000 2500 3000
PodNet 3 5 8 10 12
Kmeans+ACO 2 4 6 8 10
This above table represents the performance interms of Cost, Time, Accuracy with the existing work and the proposed ACO algorithm.
V. CONCLUSION & FUTURE WORK
SPOON is specially designed for disconnected MANETs for improving file search efficiency. Comparison was made against other popular mechanisms in different scenarios through NS-2 simulations. But MANETs fails to consider the social interest of the user interest and content based file searching. The proposed work have addressed the above issues through ACO algorithm using twitter dataset by creating the mobile nodes, forming cluster according to the user query information. This results showed that network-based classifier performed significantly better than text-based classifier on using twitter dataset. Considering tweets are not as grammatically structured as regular document texts, text-based classification using ACO optimization provides fair results like improved the Accuracy, Processing cost, Processing time and can be leveraged in cases where user may not be able to perform network-based analysis. In future work, aim is to describe more finely about different types of events that are reflected in Twitter data. If given a robust classification of events, extending the work using cukoo search algorithm it may improve prioritization, ranking, and filtering of extracted content on Twitter and similar systems, as well as provide more targeted and specialized content visualization.
...(download the rest of the essay above)