Privacy and security have been under great scrutiny over the last few years. The meaning of privacy has seen a dramatic shift over the last decade as the nature of computing has evolved. From disconnected computing devices such as mainframes, to devices that are always connected to the internet, the nature of computing itself has completely changed. Instead of processing every single task on the user’s device, a lot of the computing takes place on a server, and the results are simply sent over to the user’s device. Since the companies’ themselves control the computing, the user loses control over her/his data. Not only the nature of computing has changed, but the core business model of software services has changed too. Rather than charge users for the software and service upfront, applications serve advertisements. Serving advertisements is more complicated than it looks like and is nothing like the traditional advertising that we see on television or newspaper. Advertisements in technology industry are heavily tailored to each individual user, thus requiring deep understanding of what the end user is, what she/he likes, what is the location and so on. This indirectly results in massive data collection, so that advertisers can better understand their end user, which often leads to a violation of the user privacy.
There are various different ways in which data collection takes place. The most common technique is tracking the web browsing history of a user. This technique is often called as click stream analysis, where the sequence of clicks is monitored as a user moves from 20one site to another. Once the sequence is formed, advertisers understand the intent of the user. For example, a user visits XYZ.com and always browses the sports section for a couple of minutes every day. This means that the user is interested in a particular sport, and hence can target advertisements for the latest sporting events and so on. This seems simple and harmless, but its just tip of the iceberg. Advertisers go to enormous lengths to track users and create detailed user profiles, and companies like Facebook and Google have platforms that make tools specifically for data collection. More often than not, these user profiles get used for purposes other than simple advertising. Political views can be altered, unwanted spying can take place and there is always the possibility of the information falling into the wrong hands.
This is just a glimpse of how advertisement works today. We would be concentrating on how data collection takes places through the different hardware and software sensors present on smartphones. It is not as simple as clickstream analysis, and there is an industry that specializes in this domain.
1.2 Android & Privacy
The Android Operating System has grown in leaps and bounds over the last few years, improving in terms of features and market share with every release. Starting as a side project of Andy Rubin, the Operating System took off significantly after it was acquired by Google in 2008. What was meant to be an Operating System for embedded systems transformed into one of the most used smartphone Operating System in the world.
So, what does Android Operating System have to do with privacy. Like I mentioned before, Facebook and Google are primarily in the advertisement business. Facebook has its social network, Instagram and Google has the Android Operating System, Google Search and a few other services. If you look closely, Android was never designed with focus on privacy. The operating system was always thought of as a vehicle for Google to collect data from the end users. This is the same business model that we talked about earlier. Offer a service for free but charge the users by showing targeted advertisements. The Android Operating System is open-source and is offered for free for anyone to use. But in return, all of the Google applications like Maps, Contacts Manager, Gmail etc. have to include in the operating system. There is also a requirement that every user needs to sign in with her/his Google account before using the Operating System. All of this adds up in Google’s favor. Every user activity can be monitored easily by the company through its applications, and hence can be used to serve more ads to the user. More ads result in more money. To make things worse, even 3rd party applications can join and have their own tracking algorithms. There are Software Development Kits (SDK) specifically designed for this purpose. The tracking algorithms are included in the SDK, and the application developer simply needs to work on the core product. This makes it easy for the application developer to make money and not worry about creating a business model of his own.
Consumers typically don’t know about data collection (since it takes place in the background) and are often left in the dark. A lot of the consumers don’t even understand the difference between privacy and security. This confusion stems from the fact that privacy is often viewed as a gray area. Hence some are okay with data collection while some are not. Also, data collection is also not illegal as long as users agree to the terms, nor does it break any security protocols. But it is highly unethical and may get used for unintended purposes. Privacy means different things to different people. Unlike security, privacy is valued differently in different countries. Especially in developing countries, privacy has a very low value. People don’t care if their personal data was collected or not, partly because they are not aware of the repercussions. They also don’t understand the capability and value that their data has, and hence typically don’t bother. People get instantly attracted to a service if it is offered for free in return of some advertisements. The governing bodies in developing countries are also incapable of acting like a gate keeper and protect user’s data. To make matters worse privacy laws are often non-existent in developing countries like India, Brazil and Africa. Hence, companies like Facebook often target these regions and collect data ruthlessly without the fear of facing any fines or lawsuits.
Things are different in European countries and North America where there are stricter laws and consumers who are much more aware about technology and data collection. GDPR [6] has made things much better across Europe and similar laws are also being planned in North America. But none of this is full proof, and it won’t stop data collection since the entire business model of services today is based on data collection and advertisement.
XXXXX
Figure 1: Android Growth Chart
1.3 Current Tools/Research
There have been open-source projects offering tools to detect Malware [1]. There are a few privacy protection tools present in the Google Play Store too [3]. Though some have been successful initially, since the release of Android 8.0 most of them have stopped working due to incompatible APIs. A lot of these tools require the device to be rooted, thus making the process a lot more tedious for regular users. The ones that do still work [4], are cumbersome to use must be side loaded onto Android and don’t work directly from Google Play Store. The current privacy protection tools use APIs a system services that have been dropped out of development by Google. Some APIs even have blocked explicitly, hence making most of these tools useless. To begin with, all of these tools were research projects posted on GitHub for experimentation. None of them were designed for real world use due to lack of compatible APIs.
Apart from 3rd party tools, there are a few mechanisms within Android. There is a menu hidden under the Operating System’s settings where one can see what permissions every application is granted. A user can go further and even revoke some of these permissions if she/he is not comfortable. But this menu is not obvious to a typical user, since there are multiple steps involved. The permissions involved here are also limited and does not cover all the sensors. Device specific sensors are not included either, since this menu is baked into Android and not tailored to every individual device. Hence, sensor permissions for blood oxygen monitor, infrared sensor etc. are not included here.
There have been a few research projects that demonstrate how easy it is to access sensors in Android, with very little resistance from the Operating System. Research carried out by [5] gives a good glimpse of how sensors get abused by manufacturers in the Chinese OEMs. Unlike most other countries, OEMs in China don’t use the Google version of Android but use the open-source version of Android known as AOSP (Android Open Source Project). There are very few restrictions on AOSP, and hence OEMs can get away with a lot of data collection. There is no gate-keeper in AOSP either, any application developer or OEM can develop applications or modify the Operating System and sell it directly to the customer. The research by [5] specifically works on this topic by tracing data paths between on-device sensors and numerous applications used in the Chinese market. The findings in this research are very helpful for us, since they further confirm our assertion that on-device sensors are indeed abused in Android.
CHAPTER 2
Proposal & Goal
2.1. Proposed Solution
There are various ways of tracking user activity and collecting user data. One such methodology is to use the on-device sensors and gather data from them. When users sign up for a service, they typically grant access to all of these sensors. For example, when someone starts using Instagram, she/he grants permission for location, camera, microphone and so on. For other sensors like accelerometer, WIFI, gyroscope there is no requirement for explicit permission. The application is given access to these things by default. Once the application has permission to access these sensors, the application can use them whenever they want. Even if the application is not opened explicitly, the application can access the sensor in the background without the user’s knowledge. For instance, an application XYZ with access to the microphone can access the microphone in the background and send the recorded raw data to the companies’ server. This can be used for spying purposes or can be used for serving better advertisements to the customer.
Currently there are no tools that monitor device sensors specifically, and for a good reason. The number of sensors packed onto a device were limited even a few years back. The sensors that came on smartphones were also limited in their capabilities and were mostly restricted from 3rd party applications. For instance, the microphone access was limited to only built-in applications in the early days of Android smartphones. But over the years, the capability of these sensors has increased immensely. Even the number of sensors has increased dramatically. Typically, smartphones had 1 microphone which was used for just making phone calls. Now smartphones typically have 3 microphones which are capable of detecting sounds from across the room. Naturally, applications can do a lot more when there are so many sensors available for free on every smartphone. As it currently stands, there is no way to track how the sensors get accessed by applications, nor is there a gate keeper to avoid abusing these sensors in the first place. Credit where its due, Google has made a setting available where it lists which sensors can be accessed by an application. But it does not list which system processes can access the sensors.
2.2. Goal
Overall, there is definitely a need for a dashboard which would clearly show which sensors get accessed by an application and at what time. Applications that are accessing sensors constantly in the background need to be flagged quickly, so that the user can take appropriate action. This would also bring more awareness and clarity to the consumer. The goal of this dashboard is not to collect precise metrics, but just to flag applications that abuse sensors in the background. There would be 2 parts to the dashboard. The first section would provide real-time statistics, that would be updated regularly to show which sensors are being accessed currently. The second section would provide a historical view, showing every individual sensor present on the device and which application accessed it in the background. The historical view would be of 30 days, which would be good enough to form patterns and provide further insights. All the metrics formed in the dashboard would be stored on the device itself, without any relation to device ID or user ID, thus keeping things anonymous.
The installation process could be handled using either the Google Play Store, or by simply side loading the dashboard APK. To use the dashboard, the user needs to be a root user. Otherwise, the capabilities of this dashboard would be severely affected. To create historical view, the metrics will be stored on the device in the form of a database. To protect the metrics themselves, the metrics will be destroyed after every 30 days and will be hashed after every update. Overall, the dashboard would be simple enough to install and use for anyone, providing useful insights regarding how the device’s sensors are being used.
2.3. Challenges
There are challenges from top to bottom when it comes to creating such a dashboard on Android. Let’s begin with the challenges posed by the Android Operating System and the Linux Kernel. For understanding which applications are accessing the on-device sensors, first one has to understand which applications are currently running. Not just that, the dashboard will have to keep a tab on applications that are running, in standby and in sleep. Finding the application state is an easy task, since there are numerous places within the Android Operating System stack where one could find the application state. But constantly accessing these resources consumes a lot of power, and results in battery drain. The challenge would be to have minimum impact on system resources as the dashboard updates its metrics. Target is to not have more than 3-5% CPU impact while using the dashboard is running. Another challenge is to make this work without modifying the kernel in any way. Normally, such applications require either modifying the kernel itself, or attaching a module on top of the existing Linux kernel (often called as Loadable Kernel Module or LKM). The whole idea of the project is to bring clarity of sensor usage to consumers in a simple format. Hence the dashboard must be simple enough for anyone to install without requiring any kind of modifications to the existing device state.
Along with the above hurdles, there are the usual problems that come with any Android app development such as device compatibility, operating system compatibility and API consistency. There is a wide variety of hardware and software combination of Android devices available and making this dashboard work on all hardware configuration and software/API level is going to be a big challenge. Most of the Android devices are also not up to date, only 8% of all the current Android user base is on the latest version of the operating system. With each version, the APIs and the sensor stack has changed dramatically. The goal is to make this project work on as many devices as possible, with the oldest possible Android till the latest one.
Lastly, testing this dashboard will also be more difficult than ever. Simulating real world usage of applications is not easy. The nature of applications also changes slightly, depending on which region the device is being used in. Applications behave differently, trying to follow privacy laws within the region, hence considerable effort would go into testing these differences.