Page 216 - Big Data Analytics for Intelligent Healthcare Management
P. 216
8.3 PRIVACY OF HEALTHCARE BIG DATA 209
Clinical
information
Only Only
clinical personal
data data Personal
identifiers
Demographic information and
clinical personal data
FIG. 8.11
Relationship between clinical data and personal data.
Aggarwal [52] used clustering techniques in order to show the consequence of high-dimensional data
with k-anonymity, l-diversity, and other anonymization techniques. Ali [53] proposed this k-anonymity
method for securing user passwords. On one side, this study generalized password information up to a
certain limit and on the other side, the password was hashed with an anonymous value. Gal et al. [54]
reported that applying all three de-identification techniques (k-anonymity, l-diversity, and t-closeness)
together may create information loss due to overgeneralization and suppression. That study proposed a
micro-aggregation method on several quasiidentifiers in order to create k-anonymous information by
masking healthcare-related information. Long patient profiles with multidimensional data create a prob-
lem during the process of anonymization. Ghinita et al. [55] recommended a method for correlation-
aware anonymization of high-dimensional data. This study used big data attributes to find correlation
among them in order to reduce quasiidentifiers among sensitive personal data. Several noteworthy strat-
egies for making data anonymous are mentioned below [56]:
(a) Aggregation: When this method is applied to data, users are unable to detect the sources of the
information. In this way, data mining will be difficult.
(b) Elimination: Through this process, some fields of the data are removed from the actual data.
(c) Temporize: This method adds impurities or wrong information.
(d) Top to bottom coding: This method removes the tag information. The most important information
is removed by this technique.
(e) Group: This process puts different information together to hide an individual’s privacy.
(f) Directory replacement: Modifying the name related to the data is another way of anonymizing
information.
(g) Scrambling: Adding irrelevant matter to the actual data.
(h) Masking: Hiding information with hidden characteristics or random characters.
(i) Personalized anonymization: This technique depends on the relevant user. Customized
anonymization techniques can be used by the owner of the data.
(j) Blurring: An approximate value is used and this makes prediction difficult.
(k) Hash digest: Cryptographic hashing is another solution. This chapter will mainly focus on this
technique (i.e., blockchain).
(l) Pseudonymization: This method replaces one or more field of the record with artificial identities.
One to many pseudonyms can be used per field. By applying this method, data will be no longer
belong to a particular entity or ID.