Strategies for health data exchange for secondary, cross-institutional clinical research

https://doi.org/10.1016/j.cmpb.2009.12.001Get rights and content

Abstract

Secondary use of health data has a vital role in improving and advancing medical knowledge. While digital health records offer scope for facilitating the flow of data to secondary uses, it remains essential that steps are taken to respect wishes of the patient regarding secondary usage, and to ensure the privacy of the patient during secondary use scenarios. Consent, together with depersonalisation and its related concepts of anonymisation, pseudonymisation, and data minimisation are key methods used to provide this protection. This paper gives an overview of technical, practical, legal, and ethical aspects of secondary data use and discusses their implementation in the multi-institutional @neurIST research project.

Introduction

The on-going relationship between a patient and their medical practitioners gives rise to a health record containing detailed information of medical relevance that (a) may be distributed across clinical centres, (b) involves a range of specialities and informatics systems within individual centres, (c) expands over time, and (d) is of a personal nature to the patient. Personal health data contains in the first instance information relating to the current and historical health, medical conditions, and medical tests of its subject. Secondary uses for such information arise when, for example, the records are summarized across patients in a department or institution for performance audits, or on a larger scale for epidemiological study of referrals for a given condition [1]. The low prevalence of most health conditions calls for the integration of records from multiple clinical centres in order to ensure adequate numbers of patient records are available to allow statistically relevant results to be obtained. Making more extensive use of clinical records opens the possibility to study patterns of symptoms, treatment, and outcomes, but often requires the data be accessible as individual rather than aggregate values. Our aim is to provide a cohesive overview of the issues encountered in creating a system that facilitates secondary use of data originating from multiple clinical centres by a consortium of research users to serve as a point of reference for other workers in the field.

Traditionally, clinical information resides within the originating structure, with extracts transmitted as necessary between clinicians in the course of patient care. In general however, unified centralized repositories of, or even means to access the various health record components within large health networks do not exist. This situation is evolving as patient information is increasingly stored and communicated as computerized records referred to as an Electronic Health Records (EHRs). Within large-scale health networks, in particular national health systems, clinical need has motivated increasing integration of health information systems and interconnection between branches of the health services. Despite this progress in the clinical domain, data entry for research generally remains parallel to that of the clinical record, requiring labour-intensive transcription of clinical notes prior to centralized data entry and compilation into a monolithic database. As evidenced in epidemiological studies based on the PHARMO database, the same infrastructure for use of EHRs could facilitate the availability of large quantities of clinical data for biomedical research where the obstacles to such use need to be overcome.

In the primary use of an EHR for the treatment of a single patient, the patient's identity is necessary and protected by medical secrecy. While the patient's identity is not normally relevant for secondary uses, the data remains personal due to its specificity to the individual, and therefore the patients’ wishes regarding the use of the data must be respected. An appropriate procedure of informed consent is therefore a key requirement in the current research environment. The passage of data out of its medical setting for use in the collaborative efforts of multiple institutions, both medical and non-medical in turn raises an issue of privacy protection. The response to this issue necessitates a data protection strategy that can be clearly stated, and put into practice. Moreover, not only the handling of the medical data, but also the many activities that may form a research study must be performed in a manner consistent with the policies of privacy and data protection. Unfortunately, considerable mixing of terminology and concepts has taken place in the many documents intended to guide policy in this area. This has created a confusing environment for the researcher interested in establishing or participating in multi-centre research studies making use of primary healthcare data.

In the course of the @neurIST project (described below), a novel approach has been advanced for the federation of distributed medical records that is able to follow their evolution over time as a platform for secondary research. The project also involves an active research program that makes use of the data federation system as well as practical activities such as blood sample collection and analysis, image processing and database research. The process of implementing the systems necessary for this project and for supporting the secondary research activities has required us to face the above issues of consent and data privacy, as well as number of subsidiary and ancillary issues relating to the secondary use of clinical data in the research domain. We describe the context of ethical and legal requirements applicable to secondary research using clinical data, some of the strategies adopted for ensuring these requirements are met, and the specific solutions adopted in the course of the @neurIST project.

The @neurIST project is an EU-funded project that has proposed a strategy of federating data sources in clinical institutions for use in research and in advancing clinical practice [2]. The project involves seven clinical centres in five countries (England, Hungary, Spain, Switzerland, and The Netherlands), along with a further 25 institutions contributing to the technical development of the infrastructure for data federation, software systems for using the data, or making use of the data and samples collected (full list of partners available from http://www.aneurist.org/).

The specific clinical domain in which the @neurIST data strategy is being developed and demonstrated is that of intracranial aneurysms. This condition, in which segments of one or more arteries of the brain become abnormally dilated affects between 1% and 6% of the European population, and in a given year, about one percent of the affected patients can expect to experience a rupture of this altered vessel wall, leading to bleeding into the brain [3], [4], [5], [6]. Although treatment and care are typically concentrated in larger hospitals, no one hospital can capture sufficient numbers of cases to carry out studies having statistical significance. Further, steady progress in treatment options makes it important and yet more difficult to quickly identify differences in treatment outcome while minimising the costs and overheads of the investigations. Lastly, diagnosis, treatment and post-treatment care are often managed by clinicians in different institutions, each holding a part of the patient's clinical record that may or may not be transmitted in full to the others. This necessitates repeated data entry not only for research, but also routine clinical practice. Thus the benefits to this clinical community of cooperation at national and trans-national scales that can link across data sources is strong, and can be translated to many other diseases.

Accompanying the development effort, and acting as a test-bed to demonstrate the data federation system is the @neurIST study, centring on a genetics study involving over 800 patients and 400 controls, along with investigations in transcriptomics, computational fluid dynamics, and data-mining. For the @neurIST study, data is maintained in distinct repositories, be they the original clinical data stores or mirrors holding extracts of the data in a DMZ (De-Militarized Zone), each held by the originating institution. The data are accessible through an innovative IT infrastructure [7] based on Grid and SOA (Service-Oriented Architectures) technologies developed for this purpose. Inter-relating computational tools make use of this federated data for clinical decision support, and to support the @neurIST research activities centred on cerebral aneurysms [2], [8].

One of the clinical centres has fully integrated digital records covering diagnostic services, clinical history and imaging, as well as treatment records. To the pre-existing data items captured by this system have been added fields for management of participants within the study (e.g. to record the consent, and whether the required blood sample has been taken). As the clinical data of interest to the study is in large part that routinely captured in the course of clinical care, the effort required for data collection in this setting is kept to minimum. An important further variation from most prior efforts is that here, the institution's information systems themselves interact with queries from users of the @neurIST system that arrive via a connector program in the hospital's Internet De-Militarized Zone. This approach is the most complete manifestation of the direct federation of data envisaged by the @neurIST architecture.

The other clinical centres, all have Picture Archiving and Communication Systems (PACS), a booking system and lab test systems as separate data domains, together with hand-written notes covering the patient's condition and care. In these centres, a traditional approach is taken in which a research nurse enters the required data to a dedicated interface from the case notes or in parallel to the routine clinical care. The interface allows other records, specifically Digital Imaging and Communications in Medicine (DICOM) images [9] to be linked in, and can then prepare the data for transfer to a local database in the hospital's DMZ where the connector program can execute the queries it receives. This architecture reflects the very restrictive attitude of most clinical centres to external connectivity and data security.

The development of the @neurIST system and its use by an active research study created a need for coordination of the development efforts with the real-world needs and constraints for healthcare research, which we wish to reflect in this document. To provide a suitable background, we start by establishing the terms on which we will rely in the course of this document.

It is important to have a common understanding of how to prepare data for use in medical research — which makes it surprising that a wide vocabulary of often overlapping terms and contradictory guidelines has developed in this area, even within Europe. This leads to ambiguous interpretations and makes it difficult to have a common understanding, causing problems from legal, ethical, and technical standpoints. This section aims to summarize some of these laws and guidelines, and the terms they use, before clarifying the terminology used in this paper.

The EC Data Protection Directive 95/46/EC [10] and the associated guidance of the Article 29 Working Party1 [11] addresses the concepts of personal, identified and identifiable data. According to the Directive,

‘personal data’ shall mean any information relating to an identified or identifiable natural person (‘data subject’); an identifiable person is one who can be identified, directly or indirectly (Article 2a [10]).

The Article 29 Working Party [11] states that for data to be personal it must both “relate to” a natural person, who must be “identified or identifiable”. Further, “particular pieces of information … which hold a particularly privileged and close relationship with the particular individual” are termed as “identifiers” [11]. Identifiers can be separated into those likely to lead to identification of an individual, and those which act more as indirect clues to the identity of an individual. All of these concepts are explored further in Section 3.1.

The EC Data Protection Directive does not apply to data when the individual is not identified or identifiable. The authors of the Directive, foreseeing that if no possible means for de-identification exists the law could be unworkable, placed limits on the identification process in Recital 26, which states that “to determine whether a person is identifiable [directly or indirectly], account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person” [10]. The Article 29 Working Party opinion gives “anonymous data” an approachable definition: “[a]ny information relating to a natural person where the person cannot be identified, whether by the data controller or by any other person, taking account of all the means likely reasonably to be used either by the controller or by any other person to identify that individual [11]”.2 The terms anonymous and anonymised data are often used in relevant guidance synonymously with a spectrum of concepts like: de-identified, non-identifiable, irretrievably unlinked, irreversibly de-identified, unlinked-anonymised, or irreversibly anonymised (see Ref. [12]). Emerging standardisation activities in the privacy enhancing technologies field are attempting to clarify the underlying concepts and associated terminologies [13], [14]. The unified definitions of the terms anonymisation (and pseudonymisation) produced by these efforts tend however, to be very formal and exact, resulting in complex and hard to understand constructions and wordings and so do not lend themselves to use in communicating with patients.

The EC Data Protection Directive is often interpreted as stating that data is personal if anyone, anywhere, can directly or indirectly identify the individual — ignoring the principle of reasonable means. In this interpretation, the difficulty in establishing that data has been completely de-identified causes the resulting data to be seen as personal. The authors of a 2003 World Health Organisation (WHO) document introduce the concept of “proportional or reasonable anonymity” [15] as being useful in the context of genetic databases. According to these WHO guidelines, “proportional or reasonable anonymity exists when no reasonable means of identification of specific individuals is available”. Many ethical and legal documents require some level of de-identification when using data for the purposes of research, and in cases where the research subject must be re-contacted, or it is useful administratively, “coding” of information is advised, rather than anonymisation. The idea is to remove identifiers and place a code or pseudonym on data passed to researchers. Coded data is also referred to in relevant documents as pseudonymised, linked, linked-anonymised or reversibly anonymised (see Ref. [12]). Some guidelines [16] use the term “coded”, “reversibly anonymised”, or “de-identified” to indicate that direct identifiers such as name, birth date and address are reversibly detached and can be reattached through a code or pseudonym. Other guidelines use the same terms in a stricter way [17]: coded or de-identified in these guidelines mean also that demographic data have been detached to the point that it is not possible to identify a person “easily” without knowing the code. The definition of proportional or reasonable anonymity allows that the use of linked or linkable coded information could be seen as a means to achieve anonymity, when access to the link is restricted appropriately.

Following the considerations above, the paper uses the terms personal data, de-identification, pseudonymisation, and proportional or reasonable anonymity. Personal data refers to data that is about an individual who can (reasonably) be identified or identifiable. De-identification is the process of removing (or modifying) identifiers from the personal data so identification is not reasonably possible. Pseudonymisation is the step where a pseudonym or code is added to this de-identified data (methods are described in Appendix A). Proportional or reasonable anonymity applies to de-identified/pseudonymised data which cannot reasonably be used to identify specific individuals.

Section snippets

Informed consent

One of the main safeguards to protect individuals while undergoing treatment or research in the medical sphere is informed consent. The principle is to provide individuals with information in a format they can understand, explaining the treatment or research and enabling them to digest it and ask questions, in order to obtain their agreement to any procedures. In practice, there are many complications including: providing adequate time for consideration in cases of individuals who arrive

Legal and ethical requirements

The majority of laws or standards relating to data (or tissue) protection in clinical research take the position that a reduction in the potential to identify an individual correlates with an increasing protection to the privacy of those individuals. Thus, any discussion of privacy protection in research involving patient data, the ethical and legal consequences associated with the validity of informed consent due to the patient's ability to judge the risks, and the need for consent in the

Specific challenges encountered in @neurIST

This section details particular challenges that came up during the development of the @neurIST project, as well as the preparation and early data collection for the @neurIST study. Not all of these are universal in nature as the @neurIST study, in addition to being a research study in its own right, is a test-bed for the systems and tools developed contemporaneously in the @neurIST project. As such, some issues encountered in the course of the study may reflect the constraints of decisions made

Conclusion

This article presents the complexity of following legal regulations in multi-centre medical research projects involving secondary use of medical data with, as a concrete example, the strategy taken in the @neurIST research project that includes seven clinical centres in five European countries, as well as a further 25 non-clinical partners. Secondary use of medical data is currently a topic of attention as the data could yield considerable information that can be used to improve effectiveness

Conflicts of interest

There are no conflicts of interest.

Acknowledgements

This work was generated in the framework of the @neurIST Integrated Project, which is co-financed by the European Commission through the contract no. IST-027703. The authors would like to thank Ivan Periz and Professor Deryck Beyleveld for many fruitful discussions.

Bernice Elger (MD, PhD, MA), after having studied at the universities of Essen, Bochum, Bethel, Heidelberg (Germany) and Houston (Texas), has been teaching health law and bioethics at the University of Geneva for the past 14 years. In 2004–2005 she was visiting scholar in the University of Pennsylvania School of Medicine Center for Bioethics, Kennedy Institute of Ethics (Georgetown University) and the Department of Clinical Bioethics (NIH, Bethesda). Her research areas include genetic testing

References (73)

  • R. Dunlop et al.

    @neurIST—chronic disease management through integration of heterogeneous data and computer-interpretable guideline services

  • The Association of Electrical and Medical Imaging Equipment Manufacturers (NEMA), DICOM Part 3: Information Object...
  • European Parliament and Council, Directive 95/46/EC of the European Parliament and of the Council of October 24, 1995...
  • Article 29 Working Party, Opinion No. 4/2007 on the concept of personal data, WP 136...
  • B. Knoppers et al.

    Correspondence: the Babel of genetic data terminology

    Nat. Biotechnol.

    (2005)
  • International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), Technical...
  • A. Pfitzmann, M. Hansen, Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity...
  • World Health Organization (European Partnership on Patients’ Rights and Citizens’ Empowerment), Genetic Databases,...
  • Swiss Academy of Medical Sciences

    Biobanks: obtainment, preservation and utilisation of human biological material—medical-ethical guidelines and recommendations

    Journal of International Biotechnology Law

    (2007)
  • Human Genetics Commission, Whose hands on your genes? A discussion document on the storage protection and use of...
  • European Parliament and the Council, Directive 2001/20/EC relating to the implementation of good clinical practice in...
  • World Medical Association (WMA), Declaration of Helsinki—Ethical Principles for Medical Research Involving Human...
  • Council of Europe, Additional Protocol to the Convention on Human Rights and Biomedicine, Concerning Biomedical...
  • Council of Europe Committee of Ministers, Recommendation No. R (97) 5 of the Committee of Ministers to Member States on...
  • Council of Europe Committee of Ministers, Recommendation Rec(2006)4 of the Committee of Ministers to member states on...
  • Council for International Organizations of Medical Sciences (CIOMS), International Ethical Guidelines for Biomedical...
  • OECD

    Guidelines on the Protection of Privacy and Transborder Flows of Personal Data

    (1980)
  • Medical Research Council (MRC) (UK)

    Human Tissue and Biological Samples for use in Research—Operational and Ethical Guidelines

    (2001)
  • E. Zika et al.

    Sample, data use and protection in biobanking in Europe: legal issues

    Pharmacogenomics

    (2008)
  • A.C. Da Rocha et al.

    Alternative consent models for biobanks: the new Spanish law on biomedical research

    Bioethics

    (2008)
  • H. Nys et al.

    The regulation of biobanks in Spain

    Law Hum. Genome Rev.

    (2008)
  • J.A. Seoane et al.

    Consentimiento, biobancos y ley de investigacion biomedica

    Law Hum. Genome Rev.

    (2008)
  • B.J. Clark

    Tissue banking in a regulated environment—does this help the patient? Part 1. Legislation, regulation and ethics in the UK

    Pathobiology

    (2007)
  • Swiss Federal Law on Genetic Analysis of Human Beings (Loi fédérale sur l’analyse génétique humaine) (8.10.2004)....
  • Council of Europe, Convention for the Protection of Human Rights and Dignity of the Human Being with regard to the...
  • Cited by (74)

    • RSA based encryption approach for preserving confidentiality of big data

      2022, Journal of King Saud University - Computer and Information Sciences
    View all citing articles on Scopus

    Bernice Elger (MD, PhD, MA), after having studied at the universities of Essen, Bochum, Bethel, Heidelberg (Germany) and Houston (Texas), has been teaching health law and bioethics at the University of Geneva for the past 14 years. In 2004–2005 she was visiting scholar in the University of Pennsylvania School of Medicine Center for Bioethics, Kennedy Institute of Ethics (Georgetown University) and the Department of Clinical Bioethics (NIH, Bethesda). Her research areas include genetic testing and tissue banks and she is member of the Swiss federal commission of experts concerning the Swiss law on genetic testing as well as member of the subcommission of the commission responsible for the Swiss Academy of Medical Sciences’ guidelines on the ethical use of human material. She has published widely on issues related to health law and ethics and has received several awards for her academic work, among which was the Bizot Award of the University of Geneva for her PhD thesis (habilitation) on ethical and legal issues related to medical research involving biobanks.

    Jimison Iavindrasana studied mathematics and computer science and received his master degree in “network, multimedia and Internet” from the University of Reunion Island, France in 2000. After 4 years in industry, he decided to work in the medical field and obtained a masters degree in medical informatics from the University Paris VI, France. He is currently a PhD student at the University of Geneva, Switzerland.

    Luigi Lo Iacono studied computer sciences with a major in systems and security engineering and received his PhD from the University of Siegen (Germany) in 2005. He has previously worked in academic and industry research laboratories and currently holds the position of senior researcher in the Distributed IT Services and Security group with NEC Laboratories Europe.

    Henning Müller received his master degree in medical informatics from the University of Heidelberg, Germany, in 1997, specializing in medical signal and image processing. After a scholarship for a stay at Daimler-Benz Research and Technology North America in Portland, OR, USA, he started his PhD in content-based image retrieval at the University of Geneva Switzerland in 1998. After a research stay at Monash University, Melbourne, Australia in 2001 he finished his PhD in 2002. In the same year he started as a postdoctoral research fellow at the medical informatics service of the University and Hospitals of Geneva, Switzerland, where he started the MedGIFT project on medical image retrieval. Since 2007 he has been named professor for business information systems at the University of Applied Sciences Western Switzerland, Sierre, Switzerland.

    Nicolas Roduit studied geology and computer sciences at the University of Geneva, Switzerland. In 2007, he obtained an interdisciplinary PhD after specializing in image processing and analysis. In particular, he developed a software application for the specific needs of a major oil company. He then joined the digital imaging team at the Geneva University Hospitals.

    Paul Summers studied physics and mathematics prior to obtaining a PhD in medical physics from King's College London. Having supported clinical research in university and hospital departments, he is now focussed on imaging investigations of the spinal cord and CSF.

    Jessica Wright is a specialist in research ethics, data protection, genetic databases and biobanks after working for over 6 years on EC-funded projects PRIVIREAL, PRIVILEGED and @neurIST. She obtained an MA in biotechnological law and ethics from the University of Sheffield School of Law, where she currently holds a position as Research Coordinator.

    View full text