Realizing IoT service’s policy privacy over publish/subscribe-based middleware

The publish/subscribe paradigm makes IoT service collaborations more scalable and flexible, due to the space, time and control decoupling of event producers and consumers. Thus, the paradigm can be used to establish large-scale IoT service communication infrastructures such as Supervisory Control and Data Acquisition systems. However, preserving IoT service’s policy privacy is difficult in this paradigm, because a classical publisher has little control of its own event after being published; and a subscriber has to accept all the events from the subscribed event type with no choice. Few existing publish/subscribe middleware have built-in mechanisms to address the above issues. In this paper, we present a novel access control framework, which is capable of preserving IoT service’s policy privacy. In particular, we adopt the publish/subscribe paradigm as the IoT service communication infrastructure to facilitate the protection of IoT services policy privacy. The key idea in our policy-privacy solution is using a two-layer cooperating method to match bi-directional privacy control requirements: (a) data layer for protecting IoT events; and (b) application layer for preserving the privacy of service policy. Furthermore, the anonymous-set-based principle is adopted to realize the functionalities of the framework, including policy embedding and policy encoding as well as policy matching. Our security analysis shows that the policy privacy framework is Chosen-Plaintext Attack secure. We extend the open source Apache ActiveMQ broker by building into a policy-based authorization mechanism to enforce the privacy policy. The performance evaluation results indicate that our approach is scalable with reasonable overheads.

, which can be used to preserve the anonymity property. Embedding the policies and attributes into data and services makes services interact using a data-centric methodology, which can be used to preserve the multicast property. The main challenge is how to comprehensively preserve the policy privacy of data and services using policy matching. Bi-directional privacy policy matching means that any published data can only be sent to authorized users who are interested in it. In other words, a broker needs to check whether the published data's attributes satisfy the subscription policy provided by the subscribers, i.e., whether subscribers are interested in the data. In the meantime, the broker needs to check whether the attributes of the subscriber satisfy the access policy related to the published data, i.e., whether the data can be received by the subscriber. The direct matching will result in privacy information leakage, an attribute blinding approach can be used to address this problem. In previous work, there have been some policy privacy approaches that allow the brokers to check whether the attributes of the consumers satisfy the access policy. However, to the best of our knowledge, few existing approaches can support a comprehensive protection of data policy privacy and services policy privacy.
In this paper, we adopt the publish/subscribe paradigm as an IoT service communication infrastructure, whose underlying network capabilities can be integrated to facilitate policy-aware messaging between IoT services. To preserve policy privacy, we present a novel policy privacy model, namely a two-layer access control framework. The key point in our policy-privacy solution is using a two-layer cooperating method to meet the bidirectional privacy control requirements, which can support two-layer policy privacy: (1) the bottom one is the data layer for protecting data or events; (2) the upper one is the application layer for protecting services. The framework addresses the issues of preserving IoT service's policy privacy using a data-centric methodology. Furthermore, the policy bedding function, encoding and blinding functions are realized by applying the anonymous-set-based principle to preserve policy privacy. Such encoding and blinding attributes are Chosen-Plaintext Attack (CPA) secure, in which the same attribute under two different encodings and blinding will generate two different encoded and blinded attributes. Later, we choose one of the publish/subscribe service standards JMS to implement our access control framework. Apache ActiveMQ is used as the JMS broker and extended to perform policy evaluation. The main contributions of this paper are as follows: 1. A publish/subscribe-based IoT service communication infrastructure is modelled. 2. A two-layer access control framework for IoT services is proposed to allow publishers and subscribers to control the messaging data by matching between protection requirements and entities' capabilities. 3. Two key components are designed to act as the corner stones of the framework: (1) the policy embedding component where the policy and attributes can be dynamically generated and embedded; and (2) the blind encoding component for polices and attributes of IoT events, which realizes policy privacy. The anonymous-set-based principle is adopted to assist realizing their functions.
The remainder of the paper is organized as follows. In Sect. "Related work", we review the existing work related to our work. Section "Preliminarie" introduces the basic Publish/Subscribe-based IoT's service (i.e., SCADA) communication infrastructure and the generic concepts used in our approach. Section "Access control framework for SCADA systems" presents our access control framework for SCADA systems. Section "Policy embedding scheme" provides an embedding scheme for realizing the matching function in our access control framework. Section "Policy encoding and matching" goes into detail about policy encoding and matching to enforce access control policy. Section "Policy privacy" presents the security analysis and proof so as to ensure the correctness of our approaches. Section "Storage cost and performance evaluation" presents the storage cost and performance evaluation on Latency. Section "Conclusions and future research" provides conclusions and outlines future research.

Related work
There has been considerable work on policy privacy of secure service interactions over publish/subscribe-based systems. In this section, we will discuss related work in the following aspects:

Privacy preserving technique
The cryptographic encryption solution is a common privacy-preserving technique used in the distributed system (Goyal et al. 2006;Waters 2011;Masaud-Wahaishi and Gaouda 2011;Nishide and Yoneyama 2009;Cheung and Newport 2007;. Goyal et al. (2006) provided a Key-Policy ABE scheme, which allowed the policies (attached to keys) to be expressed by any monotonic formula over encrypted attributes (ciphertext). Waters (2011) proposed the Ciphertext-Policy Attribute Encryption (CP-ABE) scheme, where any encryptor was allowed to specify access control in terms of any access formula over the attributes in the system. However, in these approaches, the CP-ABE scheme embeds authorization policies into ciphertexts. Such schemes in publish/ subscribe systems require that a participant have many keys, where each publisher gives the participant a key. It does not allow for using notification brokers to reduce the key management burden of the participant, and does not preserve the decoupling feature between service providers and consumers, which cannot assure the expression power of broker-integrable policies. Other studies have Yu et al. (2008), Li et al. (2012), Doshi andJinwala (2011), Müller andKatzenbeisser (2011) also proposed a policy-privacy attribute-based encryption scheme, where authorization policies were hidden within the ciphertexts as well as reducing the size of the ciphertexts. These works focused on hiding policies into ciphertexts similar to policy encryption, but did not focus on the policy anonymity approach based on anonymous sets and support to manage policies flexibly. In SCADA scenarios, the authorization policies for long existing event types may be possibly modified. Updating authorization policies without re-encrypting the data again is a desirable feature of access control service. Homomorphic encryption (Gentry et al. 2009) is a novel approach for privacy preserving in publish/subscribe systems, it supports complex computation conducted on the broker, but is not practical. Compared to the above works, in this paper, the policy anonymity approach based on anonymous sets is applied to realize policy privacy. Blinding and encoding operations on event type and policy are carried out to optimize the performance of matching and storage. Our solution considers that the delegation capabilities and flexible authorization management are both requisite for access control.

Privacy preserving degree
Privacy issues in publish/subscribe system have been studied for a long time (Shikfa et al. 2009;Pal et al. 2012). However, most prior research on data confidentiality in publish/subscribe systems mainly focuses on the privacy of either subscription or publisher, there has been little work to support a comprehensive privacy protection of the published event (metadata) and the subscribed event types (Onica et al. 2016). Choi et al (2010) adopted the encrypted matching approach and Wun and Jacobsen (2007) adopted the policy management approach to protect the privacy of the published data and the subscribed data. Rao Bacon (2008) and Rao et al. (2013) investigate preserving subscription privacy in publish/subscribe systems, which are limited to supporting fine-grained access control for the published data. Opyrchal et al. (2007) focused on addressing issues of publication privacy in publish/subscribe systems by providing access control on publication. Ion et al. (2010Ion et al. ( , 2012, Pal et al. (2012) presented privacy-preserving schemes that are used to preserve subscription privacy and confidentiality of the publications. Our work is similar to Ion et al. (2012), Pal et al. (2012), however, these works adopt cryptography encryption to achieve privacy-preserving objects, which limits the efficiency of the privacy-preserving scheme.
The basic security requirements of a wide-area SCADA system over publish/subscribe-based infrastructure, and the solution to meet the requirements are presented in Zhang and Chen (2012). However, that paper did not discuss how to address the policy privacy issue in a two-layer protection way and how to embed authorization policies into events separately. In addition, the policy privacy was not considered, and the key focus was how to adopt an appropriate encryption scheme to provide distributed security framework. This paper is a continuation of the work that was presented in Zhang and Chen (2012), where a complete security framework is given, and the policy attaching issue and policy privacy are thoroughly addressed. Our access control framework is an extension of Zhang and Chen (2013) by adding the description of embedding policy and preserving policy.

Preliminaries
In this section, a publish/subscribe-based IoT communication infrastructure is modeled. The formal definitions for attribute-based authorization policy are provided. Furthermore, we give background information on the Bloom Filter, which is used to encode attributes and policies.

Publish/subscribe-based IoT communication infrastructure
A publish/subscribe-based IoT communication infrastructure (generally referred to as a Distributed Event-based System) is composed of a set of notification broker (NB) nodes distributed over a network. These NB nodes construct an overlay network, which is a logical network built on top of the physical network as shown in Fig. 1. The nodes of the overlay network are brokers, and their links are paths in the physical network.
Formally the distributed event-driven IoT service communication infrastructure can be modeled as a 5-tuple CF = �B, L, P, S, T �, where: B = (NB 1 , NB 2 , . . .) is the set of notification broker nodes; L = (L 1 , L 2 , . . .) is the set of connections between broker nodes; P = (P 1 , P 2 , . . .) is the set of publishers that may be some IoT services; S = (S 1 , S 2 , . . .) is the set of subscribers that may be other IoT services; and T = (t 1 , t 2 , . . .) is the set of event types.
Each publisher (e.g., P 1 ) or each subscriber is connected to only one of the brokers (e.g., NB1) in Fig. 1. The notification broker (e.g., NB2) that is connected to a subscriber (e.g., S 1 ) (or publisher) is called the access broker from a network view, and is also called the home broker with respect to that subscriber or publisher. The notification brokers that route events between brokers are called event routers or inner brokers (e.g., NB4). Each publisher publishes events to its home broker. Each subscriber receives events from its access broker. Clients can be a publisher, or a subscriber, or both.

Attribute-based authorization policy
In this paper, we adopt the attribute-based access control model (Hu et al. 2015).

Definition 1 (Attribute Tuple)
The attribute of a subject S is denoted by s k = (s a ttr k , op k , value k ) and the attribute of an object O is denoted by o e = (o a ttr n , op n , value n ), where s a ttr k and o a ttr n are the attribute names, op is the attribute operation such as op ∈ {=, <, >, ≤, ≥, in}, value is the attribute value. The action attribute can be one of object's attributes. The attribute tuple is s 1 , s 2 , . . .  In our paper, the op is simplified as {=} by describing digital attributes with careful intervals. Then S can be written as (w 1,1 ∧ · · · ∧ w 1,K 1 ) ∨ · · · ∨ (w l,1 ∧ · · · ∧ w l,K l ) , where w i,j := "s a ttr i,j = value i,j ", 1 ≤ i ≤ l, 1 ≤ j ≤ K l . O can be written as (w 1,1 ∧ · · · ∧ w 1,N 1 ) ∨ · · · ∨ (w n,1 ∧ · · · ∧ w n,K n ) , where w i,j := "o a ttr i,j = value i,j ", Definition 2 (Authorization Rule) An attribute-based authorization rule is rule = ( s 1 , s 2 , . . . , s K , o 1 , o 2 , . . . , o E ), the j-th subject attribute in rule is written as rule.s j , The j-th object attribute in rule is written as rule.o j .
Definition 3 (Authorization Policy) An authorization policy AP i is the set of authorization rules, which can be represented as AP i = L j=1 rule i,j , where rule i,j is the j-th element in the rule set AP i .
For example, a company, called JingFang, manages the provision of heating for citizens in the winter. The heat consumption data is classified into A and B. The data of class A is the detailed record for heat consumption of each residential home. The data of class B is the record for recording the statistical information of heat consumption. JingFang publishes these data in the SCADA system. There are two types of clients to access to the data C1 and C2. The clients of type C1 are individuals who can access their home consumption data A. The client of type C2 is a data mining company serving for JingFang, which can access the data of class B.
The attributes of these data and clients are as follows: 1. A: �(class, =, individual), (consumer, =, X)�, where X is the detailed identifier of a consumer who consumes the heat and produces the data. For the data of class A from different homes, the identifiers are different. 2. B: �(class, =, statistics), (period, =, X1)�, which indicates that the data are the list of statistical information for head consumption. That is to say, the data have the attributes as: its class is statistics, and the statistics period is X1. 3. C1: �(type, =, individual), (consumer, =, Y )� where Y is the detailed identifier of the consumer. That is to say, the subject has the attributes as: its type is individual, its consumer identifier is Y. 4. C2: �(type, =, company), (service, =, datamining)�. That is to say, the subject has the attributes as: its type is company, its service is datamining. Let Γ be an expression representing the subject attributes of rules in authorization policy required to access some data, which uses logic operators to associate the attributes, also called authorization policy, if there is no confusion. According to the definition of the authorization policy AP, Γ could be represented as Γ = (w 1,1 ∧ · · · ∧ w 1,K 1 ) ∨ · · · ∨ (w l,1 ∧ · · · ∧ w l,K l ), where w i,j ::= "s a ttr i,j = value i,j " , 1 ≤ i ≤ l, 1 ≤ j ≤ K l . According to the authorization policy AP B for data B, i.e., AP B = {(�(type, =, company), (service, =, datamining)�, �(class, =, statistics)�)}, the expression for data B is Γ B = "type = company" ∧ "service = datamining". If a customer has attributes to match Γ , he/she can access the data B. That is to say, the conjunction of the client's attributes includes the conjunction in Γ of the data. γ often denotes a customer's set of attribute conjunctions as the authorization policy. For the negative of w i,j , we can set another attribute w ′ i,j to represent it. The subject can be written as "type = individual" ∧ "consumer = X".

Bloom filter
A Bloom Filter is a simple, space-efficient randomized data structure for representing a set of strings compactly for efficient membership querying (Bonomi et al. 2006). A Bloom Filter for representing a set X = {x 1 , x 2 , ..., x n } of n elements is described by an array of m bits, initially all set to 0. A Bloom Filter uses k independent hash functions {h 1 , h 2 , ..., h k } with the range {1, 2, ..., m} . For each member x belonging to X, the bits h i (x) are set to 1 for 1 ≤ i ≤ k. The bits can be set to 1 multiple times, but only the first change has an effect. After repeating this procedure for all members of the set, the programming of the filter is completed.
The query process is similar to programming. To check if an item y is in X, we check whether all h i (y) are set to 1.

Access control framework for SCADA systems
In this section, we present our access control framework for SCADA systems. Our access control framework has two layers, where the bottom layer assumes the matching between the protection requirements of the SCADA events and SCADA applications' capabilities, and the upper layer assumes the matching between the capabilities of the SCADA events and the SCADA applications' requirements. The matching function is carried out based on some meta-data such as authorized attributes acting as capabilities and embedded policies acting as requirements. In order to improve the performance of access control schemes, the relation between meta-data and event names is first defined as in Fig. 2. In Fig. 2, JinFang is a company that provides heat service for residents in winter. It has a heat provision system that produces and consumes events named Telemetry, Telesignalling, Remote Control, and so on. The Telemetry name has some child names such as Water Temperature, Water Pressure, and so on. Each name in the name tree has its own attributes Attr, but an access control policy AP is made for one sub-tree such as Telemetry. It is worth pointing out that one event name has many name instances, which seems to contradict the assumption of the publish/subscribe paradigm. In our SCADA system, however, if a name has its child names with no different attributes and authorization policies, its child names are only used to tag different data packets and we can regard its child names as its instances. Such a method will obviously reduce the size of the name tree. For example, a sensor continuously measures the temperature of water and publishes the temperature data event every second. Different temperature data have only the difference in timestamp. We can regard all temperature data with different timestamps as different name instances of the same name: Water Temperature. This does not contradict the assumption of SCADA systems (i.e.that each data packet has a unique name) because different data can be further identified by timestamp. That is to say, we can use an instance identifier to further name a data packet, even if the parent name is common. Therefore, we use the concept type to handle this scenario. This means that different data packets with the same type may have a common parent name with the same attributes. Multiple types may have a common access control policy. The relation between event names and access control policies is as follows: 1. An event name may have many instances that have the same attributes. That is to say, these instances have the same type. A type is defined by attributes, i.e., a subject attribute expression. It is possible that two event names have the same type. In practice, a type is often unique. 2. Access control policies are often made for sub-trees. Multiple types may have the same access control policy.
A two-layer framework of access control for SCADA systems is illustrated in Fig. 3. The main component in the framework is an access control engine, a new network entity deployed in home brokers, which lies in the middle column. The engine stores name's The relation between event names and access control policies types and policies, as well as services' types and policies. When a service message arrives at the home node, the engine finds the access control policy and type by event name. It then checks for matching between name type & policy and service type & policy if the consumers subscribe to events with the name in the received event. If the matching results are not empty, the engine will enforce polices in data layer for valid consumers, where the privilege value in the event is the embedded part of the access control policy. The embedded privilege not only binds the access control policy and type to event, but also provides authentication to indicate that only the event publisher can embed such value. The access control in the application layer may be carried out by the service itself. The service can also delegate some responsibilities of access control in the application layer to the engine in SCADA systems.
The engine in the access control framework assumes three functions, which are illustrated in Fig. 3: (a) Finding a name type & policy by name, (b) Matching between requirements and capabilities, and (c) Enforcing policies. In order to realize these functions, two building blocks have to be provided. One is to embed authorization policies and types into service messages, and support dynamically generating and embedding session attributes. The embedding scheme should provide authentication support because the bi-direction matching should be finally verified to have been carried out based on actual attributes. It is desirable that the scheme itself assumes this authentication task for performance optimization. The other is to encode attributes and policies for rapid matching and keeping privacy (Bonomi et al. 2006). Figure 4 illustrates an authorization procedure before the publisher publishes the data (or service messages) in SCADA systems. The detailed steps are as follows: 1. The publisher attaches the name type and access control policy to the data prefix announcement. The access control engine stores the received name type and access control policies in its storage, called Name Type & Policy.

Fig. 3
Two-layer framework of access control for SCADA systems 2. A subscriber publishes its authorization request for the name by its type. 3. After receiving the authorization request, the publisher translates the name policy into a service policy part, called privilege value, and a network policy part, called authorization credentials. The publisher publishes to the access control engine the network policy part, which means that the SCADA systems cannot disclose some sensitive information, even if the authorization credential is stored in the engine. 4. Embedding the service policy part into data, which will bind the type and policy to the published data.
The authorization procedure is not our focus in this paper, see Zhang and Chen (2013) for further details.
The data consumers trust their home nodes and assume that these home nodes are honest. The data producers assume that the home nodes are honest but curious. That is to say, the home nodes will follow predefined protocols, but will try to find out as much secret information as possible. Home nodes might collude with malicious users. Adversaries control all communication channels, and can eavesdrop, forge, delay and discard messages as well as dynamically corrupt any participants in the system.

Policy embedding scheme
The policy embedding function and the blind encoding function are the cornerstones of the access control framework. In this section, we give the basic embedding scheme. In the basic scheme, each access control policy is expressed by an access expression Γ such as where Γ is a propositional formula, i.e., a disjunctive normal form, (w 1,1 ∧ · · · ∧ w 1,n ) is a conjunctive clause, w i,j is a basic proposition such as attr i,j = value i,j , i.e., an atomic formula. A type is expressed by a subject attribute expression γ such as where the subject attributes and object attributes are both represented by type, i.e., subject and object being relative.
The goal of embedding type and policy is to compress the variable length of attribute name and value such that it is possible to optimize the performance of matching, communication and storage. The core idea is to adopt the one-way set hash method to encode the attributes in a conjunctive clause, i.e., a set of attributes, of disjunctive normal form into a hash value. In addition, privacy can be considered in embedding. During evaluation of a customer's subscription for some sensitive event data, directly matching the customer's clear attributes against authorization policies will result in disclosing some critical information of the customer or the data owner. Thus, we adopt the policy anonymity approach, where the attribute-based access control model is used. Each customer has her/his own attributes, which are disjunctive normal forms of attribute conjunctions such as (w 1,1 ∧ · · · ∧ w 1,K ) ∨ · · · ∨ (w l,1 ∧ · · · ∧ w l,l K ). As the customer, each data event also has its attributes, but we pay attention to the subject attributes in the authorization policy for the data event, which is identified by the data's attributes. Authorization policies made by the data owner are to say what attributes a customer should have, in order to access the data event. The home broker makes a decision about the customer's subscription by matching the customer's attributes against the data's authorization policy, i.e., checking whether there is an attribute conjunction of the customer including one attribute conjunction of the authorization policy.
In order to clarify the idea of policy anonymity, we give an abstract of an anonymous set according to our requirements. We then use the abstract as a clear and formal basis to design our policy-attaching and policy-privacy scheme. For the abstract of our anonymous set, one-way random functionality and compression functionality, called set hash, play a key role in encoding the attributes in a conjunctive clause, i.e., a set of attributes, of disjunctive normal form into a hash value. The abstract of the anonymous set is defined as follows: Definition 4 (Random Oracle O set for Set) Given a set of string elements, we obtain a random bit string, which is called Random Oracle for Set, if the conditions below are satisfied.

5.
For two sets, their intersection cannot be computed if there exists no inclusion relation; 6. No elements can be computed from the set hash value (i.e.the random bit string) if the set is not publicly known.
According to this definition, a set of sensitive attributes is encoded into one-way string code and member elements are not able to be directly recovered from the code. A Bloom Filter can be used to realize such an oracle O set , but it has the deficiencies of privacy as follows: 1. Encoding a clear authorization policy into a Bloom Filter, some sensitive information can be guessed during the evaluation of customers' requests by testing membership of clear subject attributes. An attribute-blinding method should be adopted to address this issue. 2. After attributes are blinded, a membership-checking function is often used in many scenarios, which is carried out upon an explicitly given blinded attribute. When the blinded attribute is explicitly given during the membership checking, it is also a clue to link different Bloom Filters for different attribute sets, to link authorization transactions, and to guess the corresponding clear attributes, because the membership-checking result indicates whether two attribute sets include the same attribute. Therefore, the blinded attribute should be kept unknown to adversaries. 3. The membership-checking is a basic function of a set. We should propose an alternative way, where, instead of the membership-checking function, the anonymous setinclusion-checking function is used to answer the membership querying, i.e., using two Bloom Filters to complete anonymous membership querying. To the best of our knowledge, there are no existing algorithms that use set-inclusion-checking function to complete the anonymous checking function of a set member.
Therefore, the policy embedding scheme should be designed based on a Bloom Filter, where the membership-checking function is a key factor of the scheme. When we talk about using the set-inclusion-checking function to assume the membership-checking function, we mean that, for a customer's attribute conjunction, which attributes of the conjunction are included in a given authorization conjunction can be queried by inclusion queries without explicitly knowing these attributes. That is to say, each attribute in the conjunction is ordered with an index, and we try to find a method to obtain these indices, to which the attributes corresponding satisfy the authorization conjunction. The same index value in different authorization conjunctions may correspond to different attributes. When finding these indices, customers' attributes and attributes in the policy are not known and disclosed. These indices are often passed into other functions or used as an indictor to say whether they are matched.
The key idea to realize the alternation way for membership-checking function is to sort each attribute conjunction, predefine a series of auxiliary sets for each attribute conjunction of the customer, and then judge which auxiliary sets include one of the attribute conjunctions in the authorization policy. When these auxiliary sets are identified, attributes indices are computed according to the indices of these auxiliary sets. These are described in more detail below: Definition 5 (Auxiliary Sets and Attributes Indices) Assume the number of a customer's attribute conjunctions is x and the number of attributes in a conjunction is y, and the size of the Bloom Filter is m. We define a series of auxiliary sets for the attributes w 1 , w 2 , . . . , w y in a conjunction: Set 1 = {w 1 , w 2 , . . . , w y−1 }, Set 2 = {w 1 , w 2 , . . . , w y−2 , w y }, . . . , Set y = {w 2 , . . . , w y }. If there is a set AWset that is only included in one Set i (1 ≤ i ≤ y) and not included in other sets Set j (1 ≤ j ≤ y), then AWset includes the attributes as in Set i and these included attribute indices are 1, . . . , y − i, y − i + 2, . . . , y. If the set AWset is only included in two sets Set i (1 ≤ i ≤ y) and Set j (1 ≤ j ≤ y), and not included in other sets Set k (1 ≤ k ≤ y), then AWset includes the attributes as in Set i ∩ Set j , assume j > i, when j > i + 1, the attribute indices are 1, . . . , y − j, y − j + 2, . . . , y − i, y − i + 2, . . . , y ; when j = i + 1, the attribute indices are 1, . . . , y − j, y − j + 2, . . . , y. The remainder can be done in the same manner. If the set AWset is only included in the set Set y+1 , and not included in other sets Set i (1 ≤ i ≤ y), then AWset includes the all attributes as in Set y+1 , and the attribute indices are 1, . . . , y.

Policy encoding and matching
Our policy embedding scheme is based on a policy anonymity approach. In our approach, there are three steps to realize policy privacy in the access control service: blinding attributes, encoding blinded attributes into anonymous set, and matching between the customer's anonymous attribute set and an anonymous authorization policy set.

Blinding attributes
The first step is to blind attributes, which mainly consists of blinding the data's attributes, the customer's attributes, and authorization policies. The procedures are described as follows: 1. Given the set of attributes W = {w 1 , w 2 , . . . , w n } from all attribute conjunctions of all customers, a data owner makes authorization policies according to it. The elements of the set W are subject attributes. The data attributes can be discussed as the subject attributes and are not discussed further here. 2. For each w i ∈ W (1 ≤ i ≤ n), a string w i is randomly chosen as an alias of w i , and w i is replaced with w i . w i is kept secret such that all elements in W are unknown by the home brokers, clients and adversaries.

For each
where x i is chosen by probability p as a random string and chosen by probability 1 − p as an empty string. Thus, given an attribute conjunction with length length as input, the length of output conjunction varies, where the attribute w i in the attribute conjunction is replaced by Through these steps, W becomes W .
We assume that the number of attributes in attribute conjunctions averages out to length sae , and that the length length sae is extended to the anonymity length length a to give each attribute conjunction an anonymity space length a − length sae . Algorithm 1 depicts the process of blinding attributes. From Algorithm 1, we know the set of attributes used in the access control service is extended to the ((length a − length sae )/length a + 1) times of original one by appending those non-empty attributes x i (i = 1, 2, . . .) to W. For each attribute w i in the attribute set W, we define its alias as w i , which is a random string.
In Algorithm 2, the authorization policy is blinded, where, if an element w i of the authorization policy has (w i , w i , x i ) in the blinded attribute set W and x i is not empty, x i is inserted into the authorization policy. The element w i is replaced by its alias w i in the expression. The alias and added x i are not published, and only known by the data owner.

Policy encoding
When attributes and policies are blinded, the second step is to encode blinded attribute conjunctions from authorization policies and the customer into anonymous sets. The Bloom Filter is used to encode the blinded attributes. The final step is to compute the set membership, the set inclusion and intersection of two anonymous sets of the data and customer. The alternation scheme is designed to use the set-inclusion-checking function to complete the membership querying based on two anonymous sets. If the scheme is available, our anonymous-set-based idea may be used to realize policy privacy.
The Encoding Procedure is defined to describe how to obtain predefined auxiliary sets without disclosing clear attributes. The Matching Procedure is defined to describe how to identify these auxiliary sets, including the authorization conjunction, and to compute attribute indices without disclosing clear attributes.
Definition 6 (Encoding Procedure) The encoding procedure includes two parts: encoding of the attribute conjunctions of customers, and encoding of the attribute conjunctions of authorization policies.

Encoding for customers' attributes
We expand each attribute conjunction with the number of attributes in the conjunction being n, where the random attributes have been inserted into the conjunction to hide the conjunction length (the attributes and attribute conjunctions are also blinded by using Algorithm 1 and Algorithm 2, which are discussed in the next section). This is in Table 1, where the whole Bloom Filter BF t represents the attribute conjunction, Bloom Filter BF 1 represents the first auxiliary attribute set Set 1 , Bloom Filter BF 2 represents the second auxiliary attribute set Set 2 , and so on.
The attributes in the conjunction are distributed in the Bloom Filters as in Table 2. The row of the table represents the Bloom Filter, and the column represents the attribute. For example, the i − th row represents BF i , and the j − th column represents w j . If BF i (1 ≤ i ≤ n) has "1" in the j − th column, then w j (1 ≤ j ≤ n) is encoded into BF i , i.e., w j belonging to the i − th auxiliary attribute set Set i . That is to say, if the element (i, j) in the table is "1", then w j (1 ≤ j ≤ n) is encoded into BF i . The bottom row, i.e. the (n + 1) − th row, represents BF t , where all attributes in the conjunction are encoded into BF t . The right column rounded by dashed line says that each row itself is a bit string, and is denoted by B i (1 ≤ i ≤ n). For the Bloom Filter BF i (1 ≤ i ≤ n), it is computed as follows: 1. BF i is initialized to zero; 2. In the i − th row of Table 2, all attributes with "1" in their position form a set Set i ; 3. A random string is chosen to put into Set i ; 4. Set i is encoded into a Bloom Filter which is assigned to BF i .
For the Bloom Filter BF t , it is computed as follows: 1. BF t is initialized to zero; 2. All attributes in the conjunction form a set Set t ; 3. A random string is chosen to put into Set t if no random string is inserted into the conjunction during expanding; 4. Set t is encoded into a Bloom Filter which is assigned to BF t .

Encoding for the attribute conjunction in authorization policies
The Bloom Filter BF a for the attribute conjunction in an access expression, the mask Bloom Filter BF a−m are computed as follows: 1. BF a and BF a−m are initialized to zero; 2. All attributes in the conjunction form a set Set a ; 3. Some random strings are chosen to be put into Set a , and also form a mask set Set a−m ; 4. Set a is encoded into a Bloom Filter, which is assigned to BF a ; 5. Set a−m is encoded into a Bloom Filter, which is assigned to BF a−m .
From the definition of encoding procedure, we know that each BF i (1 ≤ i ≤ n) is encoded from Set i = {w 1 , . . . , w (n−i) , w (n−i+2) , . . . , w n } and a random string. The random string is a blinded mask for BF i , which does not affect checking whether an attribute is a member of Set i and whether an attribute set is included in Set i .

Policy matching
For the attribute set Set a of an access conjunction, it is impossible to check whether it is included in the attribute set Set t of subject conjunction when its Bloom Filter BF a is blinded . To address this issue, we encode the random strings used for blinding mask into an independent Bloom Filter BF a−m . Because the Bloom Filter is one-way, it is impossible to remove the blinding mask strings, even if BF a and BF a−m are given. Using bit "OR" operation, BF a−m can be added into BF i , i.e., the blinding mask strings being encoded into BF i . Then, the inclusion relationship is checked by the equation BF a ∧ (BF a−m ∨ BF i ) = BF a , i.e., being whether the attribute set Set a for authorization conjunction is included in the attribute set Set t for customers' attribute conjunction, all the procedures is shown in Fig. 5).
Definition 7 (Matching Procedure) Given the Bloom Filter for authorization policies: BF a , BF a−m , the matching scheme is as follows, where each "0, 1" bit string of rows in Table 2 is represented by b i (1 ≤ i ≤ n), '∧ ′ is bit "AND", and '∨ ′ is bit "OR".
1. Choose a "1" bit string with n size: b. 2. If BF a ∧ (BF a−m ∨ BF i ) � = BF a , the Bloom Filter for authorization and customers' attributes are not matched and the computation halts; otherwise, continue the next step. (3) and (2) Table 2.
The correctness of the matching procedure is true, because: 1. When BF a ∧ (BF a−m ∨ BF i ) = BF a , it implies that the attribute set denoted by the i − th row of Table 2 includes the attribute set of the authorization conjunction denoted by BF a . The attribute set denoted by the i − th row of Table 2 is written as b i . 2. When BF a ∧ (BF a−m ∨ BF j ) = BF a , it implies that the attribute set denoted by the j − th row of Table 2 includes the attribute set of the authorization conjunction denoted by BF a . The attribute set denoted by the row of Table 2 is written as b j . 3. From (1) and (2), we know that the attribute set of the authorization conjunction denoted by BF a is included not only in b i but also b j . That is to say, the set is included in the intersection of b i and b j . Therefore, we compute b i ∧ b j to obtain the subset, including the attribute set of the authorization conjunction.  Table 2 can be used to compute all subsets of attributes in the customers' attribute conjunction. When BF a matches against more BF x s, the set denoted by BF a includes fewer attribute elements.
We give an example to illustrate the correctness of the matching scheme. Assume Set a = {w 1 , rw 1 , rw 2 } and Set a−m = {rw 1 , rw 2 }, then BF 1 , BF 2 , . . . , BF n−1 satisfies (BF a−m ∨ BF i ) = BF a . We compute b as follows: From b = n 10 . . . 00, we know that only the position of w 1 has "1" and it is concluded that the attribute with index 1 (and w 1 unknown) is the member of Set a . Assume Set a = {w 1 , w 2 , rw 1 , rw 2 } and Set a−m = {rw 1 , rw 2 }, then BF 1 , BF 2 , . . . , BF n−2 satisfies BF a ∧ (BF a−m ∨ BF i ) = BF a . We compute b as follows: From n 110 . . . 000, we know that only the position of w 1 and w 2 (w 1 and w 2 not exposed) has "1" and that w 1 and w 2 are members of Set a .
The matching function is efficient, because only a simple bit operation is carried out. If the matching function returns False, the customer's subscription is rejected. If the matching function returns True, the re-encryption component may be invoked with the matched results from the matching function as an input to indicate what re-encryption keys should be used by the indices.

Policy privacy
A subscriber can successfully access the requested event only its attributes match the publisher's authorization policy, the subscriber can accept the subscribed event from the published event type only the event attributes match the subscriber's authorization policy. Thus our access control solution is correct. In this section, we try to clarify that, no matter what form the attacks take from adversaries, our scheme keeps privacy.

Policy privacy analysis
The Two-layer access control framework keeps privacy, which is performed through defining the concept of policy privacy and privacy proof. Home brokers are assumed to be semi-honest. This means that they follow predefined protocols while they try to find out as much secret information as possible. Home brokers might not collude with malicious users, but arbitrarily send any information to users. Given such a privacy assumption, we first introduce the definition Π PE of policy evaluation scheme, and then define the policy-privacy model for Π PE .
Definition 8 (Policy Evaluation Scheme Π PE .) Π PE consists of four algorithms as follows: 1. Init On input the attribute set W of a customer and an authorization policy Γ , the blinding attribute algorithm and the blinding policy algorithm generates the blinded attribute set W and the blinded policy Γ ′ respectively. A policy evaluation scheme Π PE in the access control system is Chosen-Plaintext Attack (CPA) policy-privacy if adversaries cannot win with a non-negligible advantage, the game is defined as follows: Definition 9 (Non-intersection CPA for Π PE .) For the policy evaluation scheme Π PE and a probabilistic polynomial time adversary Adv running in two phases, it is policyprivacy if Adv's advantage is negligible in the following game: Setup: The challenger invokes the Init algorithm of Π PE . Training Phase 1: The adversary is allowed to issue queries for the following oracles: 1. Queries O Encode oracle for EncodeforAttributes and EncodeforPolicy of Π PE . That is to say, choosing one subject attribute conjunction A 1 and one attribute conjunction in an authorization policy Γ 1 , outputting encoded attributes BF A 1 and encoded policy BF P 1 .

Queries
Challenge Phase The adversary Adv submits two random attribute conjunctions in two authorization policies Γ 0 , Γ 1 and an subject attribute conjunction A. The challenger flips a random coin δ ∈ {0, 1}, and outputs a randomized code BF P δ to the adversary. No attribute conjunctions Γ 0 , Γ 1 have appeared in the previous queries.
Training Phase 2 Training phase 1 is repeated exactly, except that the adversary may not query MatchinginPEP, for BF δ , not query oracles with any element in Γ 0 , Γ 1 .
Guess Finally, the adversary outputs their guess δ ′ ∈ {0, 1}, and wins the game if δ ′ = δ. The probability is over the random bits used by the challenger and the adversary, where Adv makes at most polynomial queries to the oracles. This definition implies that: 1. For two attribute conjunctions, the adversary cannot distinguish their encodings, i.e., they are unable to link a Bloom Filter to a specific attribute conjunction. 2. The Non-intersection requires that any element in the challenge sets Γ 0 and Γ 1 should not have appeared or will not appear in other queries. This indicates that our scheme Π PE has weaker security than that under CPA.
Definition 10 (PRF CPA ASSUMPTIOM) Given a pseudo-random function PRF(seed, key, input) with seed, key being secretly set, and two attribute conjunctions, PRF(seed, key, input) chooses one attribute conjunction and returns one random number, and then it is hard to determine which attribute conjunction is chosen according to the returned random number without knowing seed, key.
Definition 11 (PRF _BFScheme) A Bloom Filter BF is initialized to zero, and a key and n seeds are secretly generated. Given an attribute set eSET, it invokes PRF(seed, key, input) for each attribute e ⊢ eSET as input with n different seeds to obtain n random numbers that are in (0, m], i.e., being greater than 0 and less than m + 1. The position in BF is set 1 if one value of n random numbers points to it. When all attributes in eSET are iterated, BF is output.

Lemma 1
The PRF _BF scheme is CPA-secure if each element in the challenge set is not queried on.
The conclusion is straightforward. In the security proof, multiple random numbers for one element of the challenge set can be seen as multiple oracle queries for the element during a CPA-Security game, where the oracle answers each query with attaching fixed different numbers to the queried element as different inputs. The random numbers for multiple elements in the challenge set can be seen as multiple oracle queries for different elements. The premise that each element in the challenge set is not queried indicates that, during the challenge of PRF _BF, no queried elements are challenged. It is natural to require that any element in the challenged set will not be queried after challenging.

Theorem 1 PES Π PE is non-intersection CPA policy-privacy.
Proof Suppose algorithm B is given a private key, it also generates a series of seeds for random generation. B initializes the PRF _BF scheme with the key and seeds.
Init Given a set of attributes W = {w 1 , w 2 , . . . , w n }, B generates a random string w i for each attribute w i ∈ W , and randomly generates w ′ i according to the probability p. Replacing w i with (w i , w i , w ′ i ) , we will obtain a new blinded set of attributes W = {(w 1 , w 1 , w ′ 1 ), (w 2 , w 2 , w ′ 2 ), . . . , (w n , w n , w ′ n )}. Setup B maintains a set hash list H list , which is initially empty, and responds to the random oracle queries for Adv as described below.
1. Random oracle for a set H (w 1 , . . . , w n ): If this query already appears on the H list , then returns the predefined value. Otherwise, the query invokes the PRF _BF scheme with the set of {w 1 , . . . , w n } to get a Bloom Filter bf . H (w 1 , . . . , w n ) = bf is defined. Finally, it adds the tuple ({w 1 , . . . , w n }, bf ) to the list H list and respond with H (w 1 , . . . , w n ). 2. O ∈ (BF , w): If BF can be found in H list with BF = bf in ({w 1 , . . . , w n }, bf ) and w ∈ {w 1 , . . . , w n }, then returns true, otherwise returns false.
whether BF a is included is p vector = p y BF + p y−1 where if x >= n, then y = 1, otherwise y = n + 1.
For example, the average number of attributes in one conjunction is 30, the average number of conjunctions for a customer is 50, and the false positive probability is < 10 −10 with 0.6185 m/n , then the bit size for each conjunction is 1500 with 0.6185 1500/30 = 3.69 × 10 −11 , the byte size for a matrix is 1500/8 * 32 = 6000 ≈ 6 KB, and the byte size for a customer is 500 * 6 KB = 300 KB. That is to say, the home broker should provide 300 KB storage to store his/her attribute information for a customer. As for the publisher's attributes, the storage needed for each rule for a data event is 0.187 KB = 187bytes, and that for whole policy for the data event is 9 KB. If the number of attributes in a conjunction is less, then the storage cost will be significantly reduced.

Performance evaluation
Access control policy enforcement may introduce the overheads for the overall communication performance in publish/subscribe system. In this section, we focus on evaluating (1) the overhead of data event communication performance from publishers to subscribers; and (2) policy matching efficiency via the broker; (3) the scalability of the SCADA system with our access control framework, which is implemented based on a message-oriented Java Message Service (JMS) broker; and (4) the performance impact on overall performance.
Evaluation Metrics In order to evaluate the communication performance, scalability and policy matching efficiency in SCADA system with our access control framework, latency and throughput are used as the performance metrics. Here, two kinds of latencies are considered: pub-to-sub latency and broker latency. To avoid ambiguity, we present the definitions of these metrics as follows: 1. Pub-to-sub latency refers to the total time spent by a data event from its publisher to its subscriber including the time taken for broker matching. 2. Broker latency is defined as the time spent by a broker in receiving the published event, performing matching operations against all the requested subscribers and outgoing the data event to the matching subscribers. 3. Throughput is defined as the average number of the published data events per second.

Fig. 6 Testing design
Test Design We extended Apache ActiveMQ, i.e., one JMS broker, by building in a twolayer access control framework used to preserve policy privacy for the publish/subscribe system. The implementation framework is shown in Fig. 6. The broker connected to the publishers provides the subscribe filters by building the policy-based access control (AC) scheme of the published event. The broker connected to the subscribers provides the publish filters, which are the authorization policies of subscribe services. Such a broker is called secure pub/sub broker, which conducts matching operations between the encoded attributes and the encoded authorization policy for each data event. In our test, we used a data event without the authorization policy as the baseline. This means that we do not apply access control (AC) framework on the broker. Such a publish/subscibe system without secure broker is called the publish/subscibe system with plain, in which a publisher publishes the events to his/her broker, the subscribers subscribe events (by event type) through her broker, and the broker sends the data event whose event type matches the subscribed event to the subscriber. Based on the latency measure method in Chen and Greenfield (2004), the three partial time is measured, which consists of the time from publishing data event to broker, the broker matching time and the time of receiving event from the subscriber's broker. The detailed procedures of measuring latency shown in Fig. 5 are as follows: the publisher obtains a timestamp T 1 and attaches it to the published data event as soon as he/she sends the event to the broker. A broker connected to the publisher receives the event; the broker obtains the t 1 = T 2 . After the broker carries out matching operations, its outgoing data event is attached to the timestamp t 2 . When the subscriber receives the data event from his/her broker, they obtain the timestamp T 2 . Pub-to-sub latency can be calculated as pub−to−sublatency = T 2 − T 1 , broker latency (i.e. matching latency) can be calculated as brokerlatency = t 2 − t 1 . For simplicity, we assume that the time spent in sending an event from a publisher to the broker is the same as that sending the event from the broker to the subscriber. Therefore, we obtain Test Cases For the purpose of evaluating the performance property of the publish/subscibe system with two-layer access control framework (PS-ACF), we measure these latency metrics in PS-ACF and baseline (i.e., publish/subscibe system without access control). The test cases are specified as follows: 1. Evaluating latency with access control policy and latency with plain; 2. Evaluating latency metric while the data event size increases; pub−to−broker latency = broker−to−sub latency = pub−to−sub latency−broker latency 2 pub−to−broker latency = broker−to−sub latency percentage of broker latency is also low. The Pub-to-Broker time is the same as the Broker-to-Sub time; and the latencies increase by 6 % when we add one access control policy to the broker. Test Results (2) Test case (2) was carried out by increasing data event size and by adding one access control policy to the broker; results are shown in Fig. 9a, b, and the horizontal axis is logarithmic (base 10). We make a performance comparison between the pub-to-sub latency with plain and with access control, as well as the broker latency with plain and with access control. For small data event sizes, the pub-to-sub latency and broker latency are low, such as for the 1 KB data event size, and the whole latency event messaging latency takes less than 20 ms (Fig. 9a); the policy matching latency taken on the broker takes 5 ms (Fig. 9b). As the data event size becomes larger, the latency is continuous curve. PS-ACF shows the same behaviour as the baseline. As with the pub-tosub latency and the broker latency, the data event size is one of factors in the overhead. Test Results (3) The latencies with the number of policy rules on the horizontal axis are shown in Fig. 10, for a small number of rules (i.e. fewer than 16). Both pub-to-sub latency and broker latency increase slowly with increasing the number of policy rules. For the larger number of rules, the data event messaging time dominates the broker matching time. For 16 rules in a policy, the whole latency event messaging latency takes less than 25 ms and the policy matching latency taken on the broker takes 40 ms. However, the broker latency increases slowly with increasing number of rules, which indicates that our two layers access control framework in the publish/subscribe system is highly scalable and supports matching operations of more policy rules.
Analysis Results The collected latency metrics consist of maximum latency, minimum latency, average latency and latency distribution. We present the event latency statistical results based on our measurement metrics in Table 4. The results show that the test running at lower data event sizes, or with fewer policy rules may have lower pub-to-sub latencies and lower broker latencies; furthermore, the spread of latencies is compactly distributed.
The latency distribution test results for a data event size (1 KB) are presented in Fig. 11. As shown in Fig. 11a, b, for 1 KB data event, about 70 % pub-to-sub latencies with plain  are compactly distributed in the range of 25 ∼ 30 ms. About 55 % pub-to-sub latencies with one policy are compactly distributed in the range of 30 ∼ 35 ms. These latency distributions show that the publish/subscribe system with the access control framework presented in our paper has higher throughput and shorter latencies. As shown in Fig. 11c, d, for 1 KB data event, About 95 % broker latencies with plain are compactly distributed in the range of 0 ∼ 5 ms. About 80 % broker latencies with one policy are compactly distributed in the range of 5 ∼ 10 ms. During the tests of all the cases, the CPU utilization was between 15 ∼ 50 %. According to the "Little Law", we can derive the throughput in Events Per Second (EPS) as "Throughput = 1 Latency ". The pub-to-sub throughput results are presented based on the average pub-to-sub latencies with or without access control. Figure 12 shows the average sustainable throughput in processing events per second using different event a b c d

Fig. 12
Throughput for different data event size in KB sizes; the horizontal axis is given in base-10 logarithms. As with pub-to-sub latencies, the data event size is the main factor in the baseline. With data event sizes increasing, pub-to-sub throughput decreases, that is to say, fewer data events per second can be sent from the publisher to the subscriber. From the above security analysis and latency evaluation results, the overhead in terms of the number of policies for preserving the publish/subscribe system is easy to observe, but the overhead is reasonable and acceptable. The overall latency comparison shows that our access control framework has higher policy matching efficiency and higher scalability.

Conclusions and future research
In SCADA systems, named, signed and potentially encrypted content forms a solid foundation for routing and application security. The access control mechanism for SCADA systems should include independent data and application layers; and the two layers should be opaque to network entities as well as be suitable for SCADA communication features, such as event named, caching, and so on. We then propose a two-layer framework of access control for SCADA systems, where, integrating network capabilities, the data layer assumes the protection of the SCADA events, and the application layer assumes the protection of services. The anonymous-set-based principle is adopted to design our policy embedding scheme, which is presented as the foundation of access control service with policy privacy. In our scheme, the alternation method plays a key role, which uses the anonymous set-inclusion-checking function to assume the basic function of the anonymous set, i.e., the anonymous set-membership-checking function. We also extended the open source Apache ActiveMQ broker by adding authorization policies to help realize policy privacy. The evaluation results of latency indicate that our approach is highly scalable and flexible. The security analysis and performance evaluation results of latency show that the SCADA application with our two layers access control scheme flexibly authorizes as in traditional access control systems, and that home brokers can securely and efficiently execute the delegated policy enforcing function without re-encrypting data after the authorization policies are updated, where policies are encoded with blinded mask and are anonymously matched to realize policy privacy. Future research is to make our policy embedding scheme be able to resist more powerful privacy attacks from adversaries.