
Noteworthy gray literature was also used to initialize the search. Methods: We used Google Scholar, Web of Science, Elsevier Scopus, and PubMed to retrieve academic studies published in English up to June 2020. Objective: In this systematic literature mapping study, we aim to alleviate the aforementioned issues by reviewing the landscape of data anonymization for digital health care. In addition, tools to support measuring the risk of the anonymized data with regard to reidentification against the usefulness of the data exist, but there are question marks over their efficacy. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable with regard to use in different problem domains.

Although health care organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modeling. Raw data are commonly anonymized to be used for research purposes, with risk assessment for reidentification and utility. Identifiable personal and sensitive information must be sufficiently anonymized. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. For health care providers, legal use of the electronic health record (EHR) is permitted only in clinical care cases. Recent regulations enforce the need for a clear legal basis for collecting, processing, and sharing data, for example, the European Union’s General Data Protection Regulation (2016) and the United Kingdom’s Data Protection Act (2018). Using data science in digital health raises significant challenges regarding data privacy, transparency, and trustworthiness.


Journal of Medical Internet Research 6898 articles.
