Privacy compliance in european healthgrid domains: An ontology-based approach

The integration of different European medical systems by means of grid technologies will continue to be challenging if technology does not intervene to enhance interoperability between national regulatory frameworks on data protection. Achieving compliance in European healthgrid domains is crucial but challenging because of the diversity and complexity of Member State legislation across Europe. Lack of automation and inconsistency of processes across healthcare organizations increase the complexity of the compliance task. In the absence of automation, the compliance task entails human intervention. In this paper we present an approach to automate privacy requirements for the sharing of patient data between Member States across Europe in a healthgrid [1] domain and ensure its enforcement internally and within external domains where the data might travel. This approach is based on the semantic modelling of privacy obligations that are of legal, ethical or cultural nature. Our model reflects both similarities and conflicts, if any, between the different Member States. This will allow us to reason on the safeguards a data controller should demand from an organization belonging to another Member State before disclosing medical data to them. The system will also generate the relevant set of policies to be enforced at the process level of the grid to ensure privacy compliance before allowing access to the data.


Introduction
When sharing medical data between different healthcare and biomedical research organisations in Europe, it is important that the different parties involved in the sharing handle the data in the same way indicated by the legislation of the member state where the data was originally collected as the requirements might differ from one state to another.Privacy requirements, such as patient consent, may be subject to conflicting conditions between different national frameworks as well as between different legal and ethical frameworks of Member States.While most EU Member States are now governed by similar personal data protection rules, harmonization remains more apparent than real.This is due first to the fact that subject to the provision of suitable safeguards the directive leaves some space for Member States to lay down simplifications and exemptions to some of the obligations that are dictated [1], e.g. the obligation to notify the data subject of the processing of their data.Also for reasons of substantial public interest, Member States may lay down exemptions to the ban on processing of sensitive personal data in addition to those laid down in the directive, either by national law or by decision of the supervisory authority [2].Second, as noted by some studies [3], the definitions used do not lead to a uniform understanding of the key concepts underpinning the directive.Focusing on the concept of "Personal Data", many Member States find it difficult to interpret.The UK found that in some cases data is not easily classified as personal or non personal.And this classification could be relative according to the circumstances.Some data that is normally considered as non personal has been shown to be capable of being personal data in special circumstances, like shoe size, details of death, business data and encrypted data.Overlaps in the interpretation of "Personal Data" have also resulted in different ways of governing anonymized and pseudonymized data [3] [6].
Consequently, the frameworks in some Member States such as the UK [4] tend to be less favorable to the processing of personal data for medical research compared to the Italian data protection framework.They impose more constraints on medical researchers, specifically in requirements such as necessity and specificity of patient consent.Despite complaints from researchers, no simplification was provided, unlike, e.g., in Italy.This was found to be an obstacle to the participation of the UK in some European and international research projects.In contrast, the Italian data protection law seems to grant more privileges to medical researchers by allowing consent for the processing of medical data across different healthcare organizations to be given in a single, one-off statement [5].This raises the potential of ethical objection that informed consent should only be given for specific research tasks that are known to the data controller at the time the consent is collected and should be later required for any other processing that may be performed in future.[6] These issues explain the diversity, complexity and dynamicity of the rules governing privacy protection.Privacy requirements could not be generalized to cover all cases of sharing.It is rather dependent on different aspects of the data and the context.This includes the type of data and its level of sensitivity, the entities sharing the data and the purpose of sharing.Many issues of privacy compliance in the healthcare domain are due to the gap between legislation and its technical implementation within healthcare and medical research organizations.We believe modelling could simplify and abstract the complexity of rules from the real world to allow their automation and enforcement at the organizations' process level.Figure1 illustrates our vision of the smooth shift from the complexity of the real world to operational controls for privacy protection.
For this paper our ideas will be structured as follows: in section one, we present our technical solution to the modelling and automation of privacy requirements.Section two presents a proof of usability of the model for building decision support applications to help the grid's medical users to share medical data while complying with privacy obligations.In section three we extend our ontology to allow the specification of privacy policies and the mapping of this specification to a standard privacy policy language such as the Extensible Access Control Markup Language (XACML).Finally we conclude and hint to future tasks that look automatic generation of enforceable privacy policies.

Enforceable Policies Generator
Privacy Knowledge Base (OWL-Full)

Modeling Privacy requirements: OWL and SWRL
The diversity, complexity and dynamicity of the rules governing privacy protection in Europe explains the need for a modelling approach that is able to abstract this complexity and facilitate its automation and enforcement at the process level.We mean by privacy requirements all the obligations that must be fulfilled by all parties involved in the process of sharing and processing sensitive patient data for medical purposes including healthcare and medical research to preserve the patient privacy.This includes patient consent, data anonymity, and the rights of the data subject including their right to dissent and to be notified [6].Our model should reflect any conflicts between the EU Member States in the specification and the provision of these requirements.In the following paragraphs we are presenting our attempt to model and automate privacy requirements in the context of medical data disclosure in Europe.
Our approach uses W3C Web Ontology Language (OWL) [7] to represent privacy obligations in medical data disclosure.OWL allowed us to model the conceptual domain of "data sharing" or "data disclosure" and its components as a hierarchy of classes and a hierarchy of properties to represent the relationships between them.Privacy requirements such as consent requirements could be modelled as OWL classes and assigned to the "dataSharing" resource as object properties.
Moreover, OWL allows overlapping models of a concept to be merged, even when different naming have been used for the same resource; e.g., Explicit Consent might be named Express Consent in another model but both concepts have the same meaning.
With complex legal domains, we need to model relationships that cannot be expressed in OWL because the logic for describing porperties is not rich enough.Legal rules are usually expressed in the form of if-then-like rules.For example, we may want to model a rule stating that if the data belongs to the UK then patient consent is necessary for the processing.Expressing this kind of rule requires the use of a semantic web rule language to allow building sets of rules in terms of the different concepts of the sharing process already described in the ontology and their properties.This will allow us to reason on the relevant set of rules and ontology classes in order to infer privacy requirements for different instances of sharing contexts.
As a rule language we have relied on a promising approach based on the Rule Markup Language (RuleML) that is the Semantic Web Rule Language (SWRL) [8].
The following example is a SWRL representation to the rule stating that the patient consent is necessary for the sharing of a UK medical data item that is anonymized.E.g. dataSharing(?x)hasSender(?x, ?s) hasReceiver(?x,any) locatedIn(?s, UK) hasStatus(?d, Anonymised) hasConsentNecessity(?x, Necessary)

Decision support for clinicians and medical technicians to enhance compliance with privacy regulations
Our system should reason on the model described in the previous section to generate protocols for medical users to guide them through the different processing tasks.For this purpose we developed a semantic web application that allows users to specify details of the different entities that constitute a sharing process and invoke a Jess rule engine [9] to fire up the relevant SWRL rules from our model.The result is a set of new inferred axioms that are added to the model as attributes of the instance of the "Data Sharing" class in question.These are returned to the user as the set of privacy requirements to allow the sharing of the data.(See Figure 2

Create Model Infer Inferred Axioms
Figure2:Architecture of the privacy Decision support Application In this section we explain how privacy requirements are integrated within some real world workflows of medical data sharing.We focus on a real word grid scenario, the MammoGrid project [11], which was an EU-funded collaboration between hospitals and research centres from UK, Italy and Switzerland.The project aimed to standardize scanned mammograms for use in epidemiological studies, quality control for breast cancer screening, comparative diagnosis and validation of computer aided detection algorithms for mammographic images.For this case study we focus mainly on the requirement of patient consent for two critical phases of the data lifecycle: 1-Uploading the data from local resources to the grid and 2-sharing the data on the grid.For the sake of highlighting more possible conflicts between the different Member states we have chosen to suppose that France is a grid party as well.
The data that is subject to processing for this project are patient breast mammograms along with other data revealing the age and some body metrics of the patient.Data Anonymization was not a preferred option for protecting patient identity as the data that should be hidden forms important clinical variables for comparative diagnosis.The justification to the processing of patient data was patient consent and/or ethical approval.For the UK, patient consent is considered necessary even when ethical approval was granted.However ethical approval is a sufficient condition for Italy.
When a technician at one of the grid nodes tries to upload some local data to the grid shared data base, our system will automatically generate a set of privacy guidelines to assist them through their data uploading task.For example the following rule will be inferred in order to indicate to a radiologist at a French hospital that more that an express and specific patient consent is required in order to share data on the grid: dataSharing(?x)hasSender(?x, ?s) locatedIn(?s, France) concerns(?x,?d) belongsTo(?d, France) consentNecessity (?x, Necessary) con_Explicitness(?x, Explicit) consentSpecificity(?x, SpecificConsent) Similarly indicating to an Italian technician that patient consent is not necessary for uploading medical data to the grid will be based on firing up the following rule: dataSharing(?x)hasSender(?x, ?s) locatedIn(?s, Italy) concerns(?x,?d) belongsTo(?d, UK) consentNecessity (?x, Necessary) con_Explicitness(?x, Any) consentSpecificity(?x, SpecificConsent) The data now is uploaded to the grid and ready for processing for the specific medical purposes the MammoGrid projects was aiming to achieve.It is very likely that patients' mammograms would be shared with clinicians across European borders.In many cases researchers will require the data to be downloaded to their personal storage devices.At this stage we are more concerned with the usage of data for future purposes.A British organization might insist that when their data is to be processed by an Italian grid user, either the new processing purpose should be compatible with the purpose the patient has consented to or patient consent must be collected for the new purpose.The following rules determine who can contact the patient in order to collect consent, first for the UK: dataSharing(?x)concerns(?x,?data) belongsto(?data, UK) about(?data, ?patient) hasPurpose(?x,?p) isa(?p, SecondaryPurpose) generalPractitioner(?gp, ?patient) consentPointofContact(?gp) and for Italy: dataSharing(?x)concerning(?x,?data) belongsto(?data, Italy) about(?data, ?patient) hasPurpose(?x,?p) isa(?p, SecondaryPurpose) hasRequestor(?x,?r) ConsentPointofContact(?r)

Extending the ontology to enable the specification of enforceable privacy policies to insure compliance
For better governance of European integrated health systems, legal and ethical requirements for privacy must be enforced at operational level as formal privacy policies.Through the use of OWL [7] we were able to represent the concept of a "Policy" and edit policy instances in a seamless way.
Privacy policies [12] are divided into two main categories according to their order of enforcement compared to the access control policy associated with them.The first category of policies is the ones that might affect a system access control decision.For example if patient consent is required for a specific context of sharing, the system will check for availability of patient consent before allowing the user to access their data.The second category are Privacy Obligations [12] that do not affect access control decisions but are rather dealt with after a decision is made in order to control usage of data at a later stage i.e. usage of data for a secondary purpose, disclosure of data to third parties, data deletion and retention.
In order to be easily enforced at the system level we suggest that our policies should be specified in a way that conforms to a widely adopted policy language or standard that has proven efficiency in the enforcement of privacy policies.Our choice was the extendable access control markup language (XACML) [10].Privacy policies in XACML are specified using some standard extendable markup language (XML) elements including (Policy, Target, and List of rules) where the Target refers to the resource we are controlling access to, and the rules attached to the policy are described in terms of other standard elements of XACML including Rule Effect (permit, deny…), Rule Target (Subject or Requester, Resource, Requested Action) and Rule Conditions [10].The conditions attached to each rule are specific constraints on (the subject or requestor, resource, and others (depends on the context).XACML also allows users to add more user defined components or elements to the traditional vocabulary [13].
Our model captured all the components that constitute the XACML privacy policy specification and extends the Rule target component to allow setting constraints according to the purpose of data processing and the member state to which the data belongs.

Example: Purpose Compatibility Rule:
In this example we show how we modelled the policy stating that: "A user may access a patient mammogram if patient has provided informed consent for a specific purpose of processing and the processing purpose is compatible with the purpose consented for".First we have rewritten the policy as a SWRL rule using the OWL classes and properties specified in the ontology; the rule is as follows: provided(?patient,InformedConsent) isa(?consentPurpose, SpecificPurpose) compatibleWith(Purpose, ConsentPurpose) allow(?requester,Access) For easy mapping to an XACML rule, the SWRL rule needs to be specified in terms of attributes of only the generic entities that constitute an XACML Rule Target (described above) and other elements that are used to specify the general policy that the rule in question is belonging to i.e. the purpose of processing.The OWL property "Provided (Patient, InformedConsent)" is a property of the patient whose data is to be shared and it is indicating that the patient has provided an informed consent.The patient or the data subject is not one of the XACML "Rule Target" components therefore we decided to express the same condition in terms of property of the class "Subject" (the requestor of a required action on a resource or object).The result is as follows: obtained(?subject,InformedConsent) hasCollectionPurpose(?object, ?collnPurpose)CompatibleWith(?collnPurpose, ?consentPurpose) allow(?subject,Access) This rule could be easily mapped to the following XACML rule: <Rule RuleId = "1" Effect= "Permit"> <Target> <Subjects>< AnySubject/> </Subjects> <Resources> <AnyResources/> </Resources> <Action> <Any Action/> </Action> </Target> <Condition FunctionId = "string-equal"> <Apply FunctionId -"string-one-and-only"> <PurposeAttributeDesignator attributeId= "Disclosure-Purpose" DataType = "string"/> </Apply> <Apply FunctionId -"string-one-and-only"> <ResourceAttributeDesignator attributeId = "Consent-Purpose" DataType = "string"/> </Apply> </Condition> </Rule> <Purpose> <Attribute AttributeId= "purpose-id" DataType="String"> <AttributeValue> Disclosure-Purpose-1</AttributeValue> </Attribute> <Attribute AttributeId= "compatibleWith" DataType="String"> <AttributeValue> Consent-Purpose-1</AttributeVAlue> </Attribute> </Purpose> below.)