Page 216 - Big Data Analytics for Intelligent Healthcare Management
P. 216

8.3 PRIVACY OF HEALTHCARE BIG DATA          209





                                    Clinical
                                  information
                                              Only              Only
                                             clinical          personal
                                              data              data   Personal
                                                                       identifiers



                                              Demographic information and
                                                 clinical personal data
               FIG. 8.11
               Relationship between clinical data and personal data.

               Aggarwal [52] used clustering techniques in order to show the consequence of high-dimensional data
               with k-anonymity, l-diversity, and other anonymization techniques. Ali [53] proposed this k-anonymity
               method for securing user passwords. On one side, this study generalized password information up to a
               certain limit and on the other side, the password was hashed with an anonymous value. Gal et al. [54]
               reported that applying all three de-identification techniques (k-anonymity, l-diversity, and t-closeness)
               together may create information loss due to overgeneralization and suppression. That study proposed a
               micro-aggregation method on several quasiidentifiers in order to create k-anonymous information by
               masking healthcare-related information. Long patient profiles with multidimensional data create a prob-
               lem during the process of anonymization. Ghinita et al. [55] recommended a method for correlation-
               aware anonymization of high-dimensional data. This study used big data attributes to find correlation
               among them in order to reduce quasiidentifiers among sensitive personal data. Several noteworthy strat-
               egies for making data anonymous are mentioned below [56]:
                (a) Aggregation: When this method is applied to data, users are unable to detect the sources of the
                   information. In this way, data mining will be difficult.
                (b) Elimination: Through this process, some fields of the data are removed from the actual data.
                (c) Temporize: This method adds impurities or wrong information.
                (d) Top to bottom coding: This method removes the tag information. The most important information
                   is removed by this technique.
                (e) Group: This process puts different information together to hide an individual’s privacy.
                (f) Directory replacement: Modifying the name related to the data is another way of anonymizing
                   information.
                (g) Scrambling: Adding irrelevant matter to the actual data.
                (h) Masking: Hiding information with hidden characteristics or random characters.
                (i) Personalized anonymization: This technique depends on the relevant user. Customized
                   anonymization techniques can be used by the owner of the data.
                (j) Blurring: An approximate value is used and this makes prediction difficult.
                (k) Hash digest: Cryptographic hashing is another solution. This chapter will mainly focus on this
                   technique (i.e., blockchain).
                (l) Pseudonymization: This method replaces one or more field of the record with artificial identities.
                   One to many pseudonyms can be used per field. By applying this method, data will be no longer
                   belong to a particular entity or ID.
   211   212   213   214   215   216   217   218   219   220   221