Accurate mobile malware detection and classification in the cloud

As the dominator of the Smartphone operating system market, consequently android has attracted the attention of s malware authors and researcher alike. The number of types of android malware is increasing rapidly regardless of the considerable number of proposed malware analysis systems. In this paper, by taking advantages of low false-positive rate of misuse detection and the ability of anomaly detection to detect zero-day malware, we propose a novel hybrid detection system based on a new open-source framework CuckooDroid, which enables the use of Cuckoo Sandbox’s features to analyze Android malware through dynamic and static analysis. Our proposed system mainly consists of two parts: anomaly detection engine performing abnormal apps detection through dynamic analysis; signature detection engine performing known malware detection and classification with the combination of static and dynamic analysis. We evaluate our system using 5560 malware samples and 6000 benign samples. Experiments show that our anomaly detection engine with dynamic analysis is capable of detecting zero-day malware with a low false negative rate (1.16 %) and acceptable false positive rate (1.30 %); it is worth noting that our signature detection engine with hybrid analysis can accurately classify malware samples with an average positive rate 98.94 %. Considering the intensive computing resources required by the static and dynamic analysis, our proposed detection system should be deployed off-device, such as in the Cloud. The app store markets and the ordinary users can access our detection system for malware detection through cloud service.

sightings for Android (Protalinski 2012), the most prevalent infection vector is still userbased installation.
Several security measures have been proposed by the Android platform providers to prevent the installation of malware, most notable of which is the Android permission system. Each application has to explicitly request some permission from the user during the installation to perform certain tasks on the device, such as sending SMS message, etc. (Arp et al. 2014). However, many users tend to blindly grant permission to unknown applications and thereby undermine the purpose of the permission system. To help users, some information sources (Lindorfer et al. 2015) are provided for them to decide whether or not to install an app, such as trustworthiness of the app's origin, app reviews by other users, results from antivirus (AV) scanners, results from Google's app verification service, etc. However, as introduced in Lindorfer et al. (2015), all of these sources have major shortcomings and cannot prevent the installation of malware efficiently.
To solve this problem, many research methods have been proposed for analyzing and detecting Android malware prior to the installation. These methods are mainly categorized into two generic approaches, namely static analysis and dynamic analysis. For example, TaintDroid (Enck et al. 2010), DroidRanger  and DroidScope (Yan and Yin 2012) are dynamic analysis methods that can monitor the behavior of applications during runtime. Although very effective in identifying malicious activity, run-time monitoring suffers from significant overhead and cannot be directly applied on mobile devices. In addition, pure dynamic analysis systems are prone to analysis evasion. By contrast, static analysis methods, such as Drebin (Arp et al. 2014), RiskRanker (Grace et al. 2012), introduce only small run-time overhead, but struggle with increasingly popular obfuscation and dynamic code loading techniques.
In this paper, we proposed a hybrid mobile malware detection and classification system by extending a new open source analysis framework CuckooDroid (CuckooDroid 2015) to detect and classify malware accurately before installation. Our proposed system is designed for both app markets and ordinary users. For app markets, our system can perform a large-scale detection and classification aided by an automated and comprehensive analysis with CuckooDroid. For ordinary users, this detection and classification system can be provided as a service through mobile cloud service (MCS). In addition, a detailed report that is easy to grab and understand is provided, which is generated by CuckooDroid. Our proposed detection system mainly consists of two parts: anomaly detection engine and signature detection engine. Firstly, by using dynamic analysis results, anomaly detection engine can detect new zero-day and unknown malware, as done in Sahs and Khan (2012). During the dynamic analysis, some vital dynamic features of an app in runtime are tested in runtime during dynamic analysis, such as SMS, Phone, dynamic code loading, etc. The anomaly detection engine is built on one-class support vector machine classifiers. Secondly, the signature detection engine which is built based on linearSVC classifier is responsible for detecting and classifying known malware or new variants using static and dynamic analysis results. During the static analysis, many features from the source code and manifest are extracted as possible, as did in Arp et al. (2014). Aided by the static and dynamic analysis results, signature detection engine can efficiently detect new variants and identify their corresponding families through classification.
Note that the features collected during static and dynamic analysis are organized in sets of strings (such as permissions, receivers, hardware) and embedded in a joint vector space. Then each application is represented with a feature vector which can be fed to a certain machine learning technique. Due to the intensive computing resources required by the static and dynamic analysis, both anomaly detection engine and signature detection engine should be deployed off-device, such as in the Cloud. Using a classifier that is trained on a large set of known malicious apps (malware) and benign apps (goodware), our proposed system can detect whether a new app is abnormal or not firstly. Once a new app is detected as abnormal by anomaly detection engine, it is malware sample with high probability in our system. Therefore, a further comprehensive analysis by signature detection engine is started to analyze which family this malware belongs to. In case a new app is detected as normal by anomaly detection engine, we assume it is benign in this paper.
The assumption is reasonable according to the high true detection rate of anomaly detection engine in experiments, which we will discuss it in detail in evaluation part.
In summary, our contributions are as follows: 1. Effective malware detection and classification Based on two phase detection by static analysis and dynamic analysis respectively, our proposed system is capable of detecting and classifying malware with high accuracy and few false alarms. 2. Zero-day malware and new variants detection Our proposed hybrid detection system consists of two phase: anomaly detection engine and signature detection engine. Anomaly detection engine is coarse-grained and can detect new malware which is anomalous from a large number of benign apps. Signature detection engine is a finegrained, which can detect known malware or new variants of a known family. Experiment results show that the two detection engines both achieve high true positive accuracy and low false negative. 3. Integrating anomaly detection and misuse detection Considering the fact that the purely anomaly detection has a relative high false positive rate and the purely misuse detection has a relative high false negative rate, we integrate them to achieve high true positive and low false negative. As we know, we are the first to do this in mobile malware detection. 4. Detailed analysis reports Our proposed system generates a detailed analysis report that is easy to understand during the detection, which includes the extracted static and dynamic information. 5. System implementation We implemented our proposed detection system using CuckooDroid. Based on this implementation, many experiments are executed to evaluate the performance of this system.
The rest of this paper is organized as follows: related work is introduced in "Related work". Architecture overview is presented in "Architecture overview". Our proposed system implementation and evaluation are discussed in detail in "Implementation" and "Evaluation", respectively. "Discussion" concludes the paper.

Related work
In the last years, mobile malware detection has been a hot area of research, especially android malware detection. To counter the growing amount and sophistication of this malware, a large number of concepts and techniques have been proposed and are mainly categorized to: (1) static analysis; (2) dynamic analysis. A detailed and comprehensive review of the current mobile malware detection is provided in the studies of   (Suarez-Tangil et al. 2013;Sufatrio et al. 2015;Faruki et al. 2015). And since that we use the machine learning in our detection system, the related work of machine learning based detection is introduced.

Detection using static analysis and limitation
The first approaches for detecting Android malware have been inspired by concepts from static program analysis. A static analyzer inspects an app by just disassembly, de-compilation without actually running it, hence does not infect the device. Since it analyzes an app's whole source or recovered code, the analyzer can achieve high code coverage.
A large number of methods that inspect applications and disassemble their code have been proposed (e.g. Arp et al. 2014;Lindorfer et al. 2015;Grace et al. 2012;Aafer et al. 2013;Chakranomaly et al. 2013;Chin et al. 2011;Zhu et al. 2014. RiskRanker (Grace et al. 2012) detects high and medium risk apps according to several predetermined features, such as the presence of native code, the use of functionality that can cost the user money without her interaction, the dynamic loading of code that is stored encrypted in the app, etc. Comdroid (Chin et al. 2011) analyze the vulnerability in inter-app communication in Android apps and find a number of exploitable vulnerabilities. DroidAPI-Miner (Aafer et al. 2013) and Drebin (Arp et al. 2014) classify apps based on features learned from a number of benign and malicious apps during static analysis. An app recommender system is proposed in Zhu et al. (2014) to rank apps based on their popularity as well as their security risk, considering requested permissions only. FlowDroid (Arzt et al. 2014) performs a flow-, context-, object-, and field-sensitive static taint analysis on Android apps. It models Android app's lifecycle states and handles taint propagation due to callbacks and UI objects. As the most closely related to our signature detection engine module, some static features such as permissions, intent filters, and the presence of native code are also extracted in MAST (Chakranomaly et al. 2013) to perform market-scale triage and to select potentially malicious samples for further analysis.

The limitation of static analysis
Static analysis lacks the actual execution path and relevant execution context. Moreover, there exist challenges in the presence of code obfuscation as well as dynamic code loading (Poeplau et al. 2014). All those approaches lack the ability to analyze code that is obfuscated or loaded dynamically at runtime, a prevalent feature of apps as evidenced by a recent large scale study (Lindorfer et al. 2014), unless they are complemented by some form of dynamic analysis, as recently proposed in StaDynA (Zhauniarovich et al. 2015).

Our solution to the limitation of static analysis
In contrast, our proposed system does not suffer from those limitations, since our anomaly detection engine performs abnormal detection firstly through dynamic analysis.

Detection using dynamic analysis and limitation
Static analysis and detection approaches are quick, they fail against the encrypted, polymorphic and code transformed malware. In order to overcome the shortcomings of static analysis, some dynamic analysis based methods (Zhang et al. 2013b;Yan and Yin 2012;Enck et al. 2010;Burguera et al. 2011;Wu and Hung 2014;Gilbert et al. 2011;Rastogi et al. 2013) are proposed. Dynamic analysis is conducted by executing an app, on either a real or virtual execution environment such as the Android Virtual Device (AVD), and observing the app during its execution.
The analysis system TaintDroid (Enck et al. 2010) and DroidScope (Yan and Yin 2012) are the most notably, which enable dynamically monitoring applications in a protected environment. TaintDroid focuses on taint analysis and DroidScope make introspection at different layers of the platform. Although both systems provide detailed information about the behavior of apps, they require too many resources to deploy on Smartphones directly.
A first step towards the use of dynamic analysis results for Android malware detection is anomaly detection engine by CrowDroid (Burguera et al. 2011), which performs k-means cluster based on system-call counts. The number of invocations of API and system calls is selected as coarse-grained features to train various classifiers to analyze apps. However, their monitoring approach relies on modifying the app under analysis, which can be easily detected by malware. Another related approach combining static with dynamic analysis is DroidDolphin (Wu and Hung 2014). Again, the approach relies on repackaging and injecting an app with monitoring code. Although the authors observed that the accuracy increased with the size of the training set, DroidDolphin (Wu and Hung 2014) achieves an accuracy of only 86.1 % in the best case. At the meantime, these dynamic analysis methods are all prone to analysis evasion due to the increasing use of emulator detection technology in malware. VetDroid is a dynamic analysis platform for reconstructing sensitive behaviors in Android apps from a permissions use perspective (Zhang et al. 2013b). Zhang et al. points out that traditional system call analysis is not appropriate for characterizing the behaviors of Android apps, as it misses high-level Android-specific semantics and fails at reconstructing IPC and RPC interactions. Afonso et al. (2014) dynamically analyze Android apps using the number of invocations of API and system calls as coarse-grained features to train various classifiers. However, their monitoring approach relies on modifying the app under analysis, which is easily detectable by malware. AppsPlayground (Rastogi et al. 2013) performs a TaintDroid-based dynamic taint tracing, API monitoring, and kernel-level monitoring. Event triggering and intelligent execution techniques are adopted to realize comprehensive execution coverage and achieve code coverage of 33 %.

The limitation of dynamic analysis
Although dynamic analysis surpasses the static analysis in many aspects, dynamic analysis also has some drawbacks. Firstly, dynamic analysis requires too many resources relative to static analysis, which hinders it from being deploying on resource constraint smartphone. Secondly, dynamic analysis is subject to low code coverage. Sasnauskas and Regehr (2014) mentioned that producing highly structured inputs that get high code coverage is an open research challenge. Thirdly, recently malware attempts to detect the emulator and other dynamic analysis systems (Vidas and Christin 2014; Petsas et al. 2014;Jing et al. 2014), avoiding launching their payloads. Thus, some dynamic analysis systems are prone to analysis evasion.

Our solution to the limitation of dynamic analysis
On contrast to the above mentioned methods, anomaly detection engine in our proposed detection system performs dynamic analysis through Dalvik Hooking based on Xposed Framework. Therefore, our analysis module is difficult to be detected by avoiding repackaging and injecting monitoring code. As we know, most of dynamic analysis methods don not integrate the anti-emulator tools and thus are prone to analysis evasion. To solve this problem, some emulator anti-detection tools (such as Content Generator, etc.) are integrated to make a more transparent dynamic analysis environment, which can avoid emulator detection at a certain extent and extract more valuable dynamic information. As for code coverage, we adopt MonkeyRunner (Android Developers 2015) to stimulate the inputs during app execution.

Detection using machine learning and limitation
The difficulty of manually crafting and updating detection patterns for Android malware has motivated the application of machine learning. Several methods have been proposed to detect and analyze applications automatically using machine learning methods (e. g. Arp et al. 2014;Lindorfer et al. 2015;Grace et al. 2012;Aafer et al. 2013;Afonso et al. 2014;Spreitzenbarth et al. 2013;Amos et al. 2013). For example, the method proposed in Arp et al. (2014) applies linearSVC learning methods to the static features of applications for detecting malware. Similarly, the methods RiskRanker (Grace et al. 2012) and DroidAPIMiner (Aafer et al. 2013) use machine learning techniques to detect malware with features statically extracted from Android applications. In contrast, the method proposed in Afonso et al. (2014) detects malware with a machine learning technique and dynamically extract features. A framework is proposed in Amos et al. (2013) to evaluate mobile malware classifiers based on the same features as Andromaly with an equally limited testing set of only 50 applications. Additionally, the tested classifiers achieve substantial false positive rates ranging from 14.55 % up to 44.36 %, rendering them completely impractical. Closest to our work are MARVIN (Lindorfer et al. 2015) and MobileSandbox (Spreitzenbarth et al. 2013), which use the static and dynamic features by machine learning and achieve high accuracy.

The limitation of machine learning based detection
Overall, previous work focuses on detecting malware using machine learning techniques, which are either misuse-based detection or anomaly-based detection. Misuse based detector tries to detect malware based on signatures of known malware. Misuse detector is specifically designed to detect known malware, leading to low number of false alarms. However, misuse detector could not detect zero-day malware. Anomaly detector refers to identifying malware that is anomalous with respect to the normal apps. Despite their capability in detecting zero-day malware, anomaly detector suffers from high false positive rate. The misuse and anomaly detector are complementary.

Our solution to the limitation of machine learning based detection
Hence, by taking advantages of low false-positive rate of misuse detector and the ability of anomaly detector to detect zero-day malware, a hybrid malware detection method is proposed in this paper, which is the novelty in this paper.

Architecture overview
As described in Fig. 1, our proposed detection system mainly consists of two engines: anomaly detection engine and signature detection engine. Anomaly detection engine is responsible for performing zero-day malware detection through dynamic analysis. And signature detection engine is responsible for performing new variant detection by combining static analysis results with dynamic analysis results. Signature detection engine is trained on known malware and benign apps.
Considering the resource-consumption process of detection and the constraint computing resource on mobile devices, both anomaly detection engine and signature detection engine should be deployed off-device at somewhere with rich resources, such as in the Cloud.
The process is outlined as follows:

Static and dynamic analysis
Firstly, all the train datasets and test datasets are processed statically and dynamically. As done in Drebin (Arp et al. 2014), we extract the static features from the manifest file and the disassemble dex code. In order to extract the dynamic features of apps during runtime, CuckooDroid is used to run the apps in an emulator environment. As shown in Fig. 2, CuckooDroid is composed of one manage node, and a number of slave nodes, which can be either Android emulators or linux-based virtual machines in the Cloud. Contrast to other dynamic analysis, CuckooDroid has integrated a collection of known emulator anti-detection techniques for hiding the Android emulator and providing a transparent analysis environment. At the meantime, a Dalvik API hooking based on Xposed framework is adopted to capture the dynamic API calls and information. Also the analysis results of submitted app are stored in a database in our proposed method.

Fig. 1 System overview
Through this way, when a submitted app has been analyzed before, its analysis results will be returned directly.

Anomaly detection
Anomaly detection engine is responsible for detecting normal and abnormal apps through dynamic analysis and providing a preliminary analysis results. In contrast to signature detection, dynamic features will be used in anomaly detection and be embedded into vector space. Also a Variance treshold-based feature selection method is applied to these feature vectors. In order to detect abnormal apps, a One-Class SVM classifier model is built on benign apps. A new app will be labeled as either zero-day malware or benign app by this trained classifier. When an unknown app is submitted, its feature vector will be fed to the classifier and a decision about whether is malware or not is made.
If an unknown app is categorized as abnormal and it is not known malware, further signature detection will be triggered to classify this malware and determine which family it belongs to.
In order to maintain the detection accuracy of the two detection engines, all the new variants, zero-day malware and benign apps will be stored to update the training dataset at a specific period.

Signature detection
At first, the extracted static and dynamic string features will be embedded into vector space, generating feature vectors. Then, a Chi 2 -based feature selection method is applied to these feature vectors.
During signature detection, the feature vectors of malicious and benign apps will be generated first, as stated above. Then a linearSVC classifier model is trained based on these train feature vectors which consist of known malware and benign apps.
The detected abnormal app during anomaly detection will be further classified using a multi-family classifier. When the classification ends, the detected abnormal app will be classified into a certain malware family. Since the detected malware is unknown, it will be a new variant of a family with a high probability.

Feature extraction
Feature extraction is an essential part of both anomaly detection engine and signature detection engine. Both the static analysis and dynamic analysis are performed before anomaly detection and signature detection. CuckooDroid is used to extract the dynamic features of each app. At the meantime, some static features are extracted.

Static analysis features
For android apps, static analysis can provide a rich feature set about the app, such as requested permissions, registered activities, etc. In this paper, our static analysis mainly focus on the manifest and the disassembled dex code of the app, which both can be obtained by a linear sweep over the app's source code and files. We adopt the Android Asset Packaging Tool to extract the static features, as Drebin (Arp et al. 2014). Additionally, several aspects of the app's code are statically determined in case they might not be triggered during the dynamic analysis phase, as done in (Lindorfer et al. 2015), such as the use of reflection API, the dynamic loading of code, the use of cryptographic API. Specially, the static feature extracted mainly includes two parts: the static features from manifest and disassembled code.
Every application developed for Android must include a manifest file, which provides data supporting the installation and later execution of the application. As did in Arp et al. (2014), we extract the information stored in this file. The specific static features extracted include: Hardware components, Activities, Intents-filters, etc. Also, some static signatures about an app are generated according to the extracted static information, such as "Application request dangerous permission", "Application uses native code", etc. Compared to Arp et al. (2014), we do not extract the network address features and restricted API calls features. The detailed extracted static features are shown in Table 1.

Dynamic analysis features
As research on ×86 malware detection, purely static analysis techniques are prone to evasion by some anti-detection techniques, such as code obfuscation, etc. In order to prevent attackers from evading the learning method, e.g. with mimicry attacks (Šrndić and Laskov 2014), features should inherently represent the malicious behavior to be detected. Thus the corresponding dynamic analysis features capturing the harmful behavior should be extracted. In order to extract the dynamic analysis features, we extend the open-source and automated dynamic analysis framework CuckooDroid proposed in CuckooDroid (2015). CuckooDroid performs dynamic analysis at Dalvik-level through a Dalvik API monitoring based on Xposed framework. In addition, a new stimulation tool is integrated to trigger program behavior and increase code coverage Robotium (Robotium 2014), which is used to enhance original CuckooDroid by simulating user's interactions with the mobile apps and can automate the testing process. During the dynamic analysis, we monitor the following events: "File access and operations", "Register receivers", "Executed commands", "Content resolver queries", "Telephony Manager listen", "Find resource", "Dynamic suspicious calls", "SMS", "Phone Events", "Data leaks", "Network operations", etc. Compared to (Lindorfer et al. 2015), we also extract the crypto keys that apps use, then the encrypted traffic can be transformed to plaintext traffic.

Embedding into vector space
The extracted static analysis features and dynamic analysis features are expressed as strings, which cannot be fed to machine learning directly. For example, a malware sample sending premium SMS messages may contain the requested permissions "SEND_SMS", and the hardware components "android.hardware.telephony". During our evaluation on dataset of 11,560 benign and malware samples, we extracted 190,367 different static and dynamic features, as shown in Table 1.
As most machine learning methods operate on numerical vectors, we need to map the extracted feature sets into a vector space first. Thus, we need to represent an app as an appropriate vector in order to machine learning algorithm. To this end, we use a simple bag of words representation. Firstly, we defines a features set S, which comprises all observable extracted string features. Secondly, a |S|-dimensional vector space can be defined using the feature set S, where each dimension is either 0 or 1. Then an app X can be mapped to this space by constructing a vector ϕ(X), for each feature s extracted from X the corresponding dimension is set to 1 and all other dimensions are set to 0.
where the indicator function I(x, s) is simply defined as

Choosing a classifier
When got feature vectors of apps, our proposed hybrid system uses different classifiers to perform anomaly detection and signature classification. In this paper, 1-class SVM algorithms and linear classifier are used to build the anomaly detection models and signature classification models, as introduced below.

Anomaly detection
Consider a data set of n observations from the same distribution described by p features. Consider now that we add one more observation to that data set. Is the new observation so different from the others that we can doubt it is regular? (i.e. does it come from the same distribution?) Or on the contrary, is it so similar to the other that we cannot distinguish it from the original observation? This is the question addressed by the novelty detection tools and methods. In general, it is about to learn a rough, close frontier delimiting the contour of the initial observations distribution, plotted in embedding p-dimensional space. Then, if further observations lay within the frontier-delimited subspace, they are considered as coming from the same population than the initial observations. Otherwise, if they lay outside the frontier, we can say that they are abnormal with a given confidence in our assessment.
A one-class support vector machine (1-class SVM) is a popular anomaly detection algorithm in various applications. In our proposed detection system, we use the novelty detection provided by 1-class SVM in Scikit-learn to detect anomaly malware.
The One-Class SVM has been introduced by Schölkopf et al. for that purpose and implemented in the Support Vector Machines module in the svm. OneClassSVM (http://www.scikitlearn.org/stable/modules/generated/sklearn.svm.OneClassSVM. html#sklearn.svm.OneClassSVM) object. It requires the choice of a kernel and a scalar parameter to define a frontier. The RBF kernel is usually chosen, although there exists no exact formula or algorithm to set its bandwidth parameter. This is the default in the scikit-learn implementation, we also choose the RBF kernel. The 'nu' parameter (we set nu = 0.01), also known as the margin of the One-Class SVM, corresponds to the probability of finding a new, but regular, observation outside the frontier. The 'gamma' parameter is set to 0.01, too.

Signature detection
The signature detection is responsible for classifying abnormal malware to its family. Due to that we classify the abnormal malware by the static and dynamic features, thus, the chosen classifier should adapt to high-dimensional feature space, as we extract 190,367 different static and dynamic features. Also the chosen classifier should adapt to sparse data, considering the single app only exhibit a small subset of the possible features.

LinearSVC classifier
Compared to SVM, linear SVC implements "one-vs-the-rest" multi-class strategy. Given a feature vector ϕ (x↽, a linear classifier computes the scalar product with a weight vector � w : y = i x i w i . The outcome, y, is the margin of the classification. Similar to SVC with parameter kernel='linear' , linear SVC implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better (to large numbers of samples). Furthermore, linear SVC supports both dense and sparse input and the multiclass support is handled according to a one-vs-the-rest scheme.
As suggested in (Lindorfer et al. 2015), linear SVC with L1 regularization are superior to linear SVC with L2 regularization when dealing with many irrelevant features, while linear SVC with L2 regularization is extremely sensitive to the presence of irrelevant features. We show in our evaluation in Sect. 5 that both methods lead to similar results during classification, while linear SVC with L2 classifier performs slight better.

LinearSVM classifier
In principle, an SVM works the same way as a linear classifier. However, it does address one problem of linear classifiers: as the name already suggests, the latter classifies samples only accurately if the problem is linearly separable. To overcome this limitation, SVMs use the "kernel trick", implicitly mapping the input into an even higher-dimensional space, where the problem is more easily separable.
However, the LinearSVM implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10,000 samples.
Furthermore, as further detailed in "Evaluation", pure linear classification performs better than LinearSVM and has high classification accuracy. Thus, our signature classification chooses purely linear classifier.

Feature selection
The features extracted during feature extracted include a large number of static and dynamic features, as we introduced in Table 1. However, this not means that they are all useful for anomaly detection and signature classification. To improve the performance and accuracy of our hybrid detection system, we apply different feature selection methods to anomaly detection and signature classification.
The classes in the sklearn.feature_selection (http://www.scikitlearn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html#sklearn.svm.OneClassSVM) module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators' accuracy scores or to boost their performance on very high-dimensional datasets. So we choose this class to perform our feature selection.

For anomaly detection
VarianceThreshold is a simple baseline approach to feature selection. It removes all features whose variance doesn't meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples. In our system, the parameter of VarianceThreshold is set "threshold = (0.12 × (1 − 0.12))".

For signature classification
Univariate feature selection works by selecting the best features based on univariate statistical tests. It can be seen as a preprocessing step to an estimator. Scikit-learn exposes feature selection routines as objects that implement mutiple transform methods, includes: SelectKBest(), SelectPercentile(), Generic UnivariateSelect(). In our system, we implement the SelectKBest method to removes all but the k highest scoring features. The parameter of SelectKBest is set "score_func = chi2, k = 2000".

Evaluation
After presenting our hybrid detection system in detail, we now proceed to an empirical evaluation of its efficacy.

Dataset
For all experiments, we consider a dataset of real Android applications and real malware. To acquire benign apps, we design our crawler and craw a large number of apps in China app stores, such as http://www.appchina.com, http://www.as.baidu.com, http:// www.mm.10086.cn, etc. To collect benign apps, we submit the crawled apps to VirusTotal, we label an app as benign if it does not trigger a response from 55 AV scanners used by VirusTotal. The malware samples we used in experiment are acquired from Drebin (Arp et al. 2014), The malware samples have been collected in the period of August 2010 to October 2012 and were anomaly detection engine available to us by the MobileSandbox project (Spreitzenbarth et al. 2013). The final dataset contains 6000 benign apps and 5560 malware samples. As clamed in Arp et al. (2014), this is one of the largest malware datasets that has been used to evaluate a malware detection method on Android.
An overview of the top malware families in our dataset is provides in Table 2, which contains several families that are currently actively distributed in app markets.

Anomaly detection performance
In this experiment, we evaluate the detection performance of our anomaly detection engine and compared it with other related detection approaches, some common AV scanners.

Detection performance of anomaly detection engine
Firstly, a one-class SVM model is trained using 4000 benign samples. Then we test our 5560 malware samples on this model. The false negative rate is 1.16 %, which indicates that 65 malware samples are not detected. Next, we used the remaining 2000 benign samples as test samples to evaluate the false positive rate of our anomaly detector. The result shows that 1.30 % of benign apps are mistakenly recognized as abnormal apps during anomaly detection. This means, if our anomaly detector is applied to Google Play, a world's android app market, among the approximately 1200 new apps per day, around 15 apps will be mislabeled as abnormal. This anomaly detection accuracy has surpassed the proposed methods in Zhang et al. (2014).
Here we use a confusion matrix to characterize the performance of anomaly detection engine, as shown in Fig. 3. From the confusion matrix, it is obvious that our anomaly detection engine can accurately detect abnormal malware.

Compared it with other related detection approaches
As discussed above, our anomaly detection engine anomaly detection engine can detect malware samples with true positive rate of 98.84 %, false negative rate of 1.16 %. Also anomaly detection engine can correctly label benign apps with a true positive rate of 98.7 %, a false negative rate of 1.3 %. To evaluate the performance accurately, a 10-fold cross validation is further performed, which is shown as ROC curve in Fig. 4.
As a comparison, we use the ROC figures in (Arp et al. 2014), as shown in Fig. 5. It is obvious that our anomaly detection engine outperforms other related detection methods.
Due to that our anomaly detection engine uses benign apps to train model, thus, no malware sample of certain families are required in advance. Therefore, whether our anomaly detection engine can detect unknown families are worth studying. To evaluate the detection unknown families malware performance of our anomaly detection engine, we use the top 20 malware families as test dataset. The detection rate of each family based on our anomaly detection engine is shown in Table 3, with an amazing result. All families except for FakeInstaller can be detected with a true rate of 100 %. Two instances of FakeInstaller are falsely labeled as benign apps.

Compared it with other AV scanners
Although our anomaly detection engine anomaly detection engine shows a better performance compared to related approaches, in the end it has to compete with common anti-virus products in practice. Consequently, we also compare our anomaly detection engine against the nineteen selected common AV scanners on our malware datasets. The detection rate of each scanner is computed. The nineteen AV scanners include CAT-QuickHeal, Alibaba, Symantec, ESET-NOD32, TrendMicro-HouseCall, Kaspersky, Tencent, Fortinet, Microsoft, Qihoo-360, Ikarus, Baidu, etc. These ninetees AV scanners are represented as A1-19. The detection performance of AV scanners and ours are shown in Fig. 6.

Signature detection performance
In this experiment, we evaluate the signature classification performance of our signature detection engine based on Linear SVC-L1, Linear SVC-L2 and LinearSVM classifier, respectively.
We use a multi-label classification to identify the malware family of the unrecognized malicious samples. To compare the classification performance of three signature detection methods, a 10-fold validation is performed. In each fold, we split the dataset into train dataset (70 %), test dataset (30 %) respectively. The confusion matrix of classification results using three methods are shown in Figs. 7,8,9. The classification rate of three methods is shown in Fig. 10. Above all, it is obvious that the linear SVC with L2 surpasses the other classification methods. Therefore, we use the linear SVC with L2 in our hybrid detection system. In addition to high accuracy during misuse detection and anomaly detection, all the extracted features will be exhibited to users as a detailed analysis report. The analysis report is shown in Fig. 11. From the analysis report, an ordinary uses or expert can grasp and understand more information about the detected app.

Discussion
In this section, we discuss the limitation of our proposed system, potential evasion techniques for our system and future work.  In our proposed system, the static analysis and dynamic analysis both work on Dalvik level. In general, our proposed system cannot handle native code or HTML5-based applications. This is because both the ARM binary running on the underlying Linux and the JavaScript code executed in WebView are not visible from a Dalvik bytecode  perspective. Therefore, a future work is needed to defeat malware hidden from the Dalvik bytecode.

Limitation 2
Although some anti-emulator detection techniques are adopted to make our dynamic analysis environment more real, more new emulator detection methods are proposed (Vidas and Christin 2014; Petsas et al. 2014;Jing et al. 2014) and can detect our emulator Fig. 11 The detailed analysis report analysis easily. As a part of our future work, we consider adopting the dynamic hooking methods proposed in Hu and Xiao (2014) to prevent emulator evading.

Poential evasion
As we know, learning-based detection is subject to poisoning attacks (Zhang et al. 2014). An adversary may deliberately poison the benign dataset through introducing clean apps with malicious features to confuse a training system. For example, he can inject harmless code intensively to make sensitive API calls that are rarely observed in clean apps. Once such samples are accepted as the benign samples, these APIs are therefore no longer the distinctive features to detect related malware instances.
However, our proposed system is slightly different from prior works. Firstly, the static and dynamic features are extracted to construct feature vector. Therefore, it is much harder for an attacker to make confusing samples at the behavioral-level during dynamic execution. Secondly, our anomaly detection engine serves as a sanitizer for new unknown samples. Owing to the high true positive rate of anomaly detection, any abnormal samples will be detected and further signature classification will be triggered.

Future work
As we stated above, some future works are need to improve our proposed system.

Future work 1
In order to solve limitation 1, we will adopt virtual machine inspection technique (Tam et al. 2015) to include system-level events as part of the behavioral aspects (dynamic features) of an app. Incorporating system calls into our feature space can improve the behavioral models, leading to more accurate results for detecting malware using native code. Furthermore, malware utilizing root exploits can be detected and characterized more precisely by using system-level events.
To improve the code coverage in dynamic analysis, we also explore more intelligent user interactions which surpasses the MonkeyRunner (Android Developers 2015) we currently used.

Future work 2
As the limitation 2 stated above, in order to prevent malware from detecting emualtor and evading analysis, we will adopt new dynamic hooking mehtods (Hu and Xiao 2014) to construct a more real dynamic analysis environment.

Future work 3
Although we mentioned that our proposed system can be deployed in the Cloud, how to deploy it is not clear. As a important part of our future work, we will refer some recent moile cloud computing researches (Titze 2012; Chun and Maniatis 2009;Kosta et al. 2013;Borcea et al. 2015;Barakat et al. 2014) to provide a security, efficient and convient service for both app markets and ordinary users.