Page 239 - Big Data Analytics for Intelligent Healthcare Management
P. 239

232     CHAPTER 9 INTELLIGENCE-BASED HEALTH RECOMMENDATION SYSTEM




             In the above equation, s(a, u) represents similarity between two users a and u. r a,i and r u,i denote rating
             on an item i by user a and u respectively, whereas r a and r u are the mean rating given by user a and u
             respectively, while n denotes total number of items in the user-item matrix.
                Cosine similarity can be defined as one of the approaches for calculating the similarity between two
             nonzero vectors of an inner product space. The degree of similarity depends on the cosine angle be-
             tween the vectors. It is mainly applied in information retrieval and text mining. The similarity between
             two items u and v is described in Eq. (9.2).
                                                          X
                                                ! !
                                                u : v        r u,i r v,i
                                        !  !                 i
                                                !  !     X      X
                                      su , v ¼         ¼ q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffi  (9.2)
                                                u * v       2       2
                                                            r      r
                                                           i u,i   i v,i
             The Jaccard similarity of sets A and B is jA\Bj/jA[Bj, that is the quotient between the size of the
             intersection and the size of the union of A and B. It is described in Eq. (9.3).
                                              ð
                                             JS A, BÞ ¼ A\Bj= A[Bj                         (9.3)
                                                          j
                                                     j
             In user-based collaborative filtering, we measure user-user similarity by contrasting ratings on the same
             product. By calculating a weighted average of ratings of the item by users, it predicts the rating that an
             active user gives to the item. In this way, the above three methods are used to measure the similarity
             between two items.
             B. Model-based collaborative filtering
             This approach works on the basis of previous users’ ratings to construct a model that uses the data
             mining technique and sometimes applies machine learning. Different approaches such as association
             rules, clustering, decision tree, artificial neural network, regression, and Bayesian classifiers, etc. are
             used to classify the user and item based on the model.
                Association rule mining: These algorithms generate association rules that decide the relationship
             among items in a transaction. Association rule A!B means item set A predicts item set B. This rule
             can be fitted to a recommendation model to prehrdict information about the user and item.
                Clustering: A method used to break the dataset into a number of clusters. Low inter-cluster sim-
             ilarity and high intra-cluster similarity are some of the characteristics of a good clustering method.
             In the recommendation system, users can participate in different clusters partially and degree of par-
             ticipation can be calculated by taking an average across the clusters of participation [18].
                Decision tree: Decision tree makes a graph-like tree structure, which is constructed by considering
             the training dataset in which class labels are known. This tree can be used to classify test data. The
             decision tree is one type of classifier that handles and classifies previous unseen examples [19].
                Artificial neural network (ANN): Networks of various interlinked neurons organized in different
             layers. The weight and bias factors are associated with each and every neuron of each layer. Each neu-
             ron has a transfer function through which it measures input, processes this input and provides an output.
             This ANN is a classification technique used to classify test data.
                Regression: Regression is an analysis method where two or more variables are related to each other.
             One variable is dependent, whereas one or more are independent variables. This regression technique
             comprises prediction, curve fitting, and hypothesis testing, which create relationships among variables.
                Bayesian classifier: The Bayesian classifier is used to solve classification problems based on con-
             ditional probability and Bayes’ theorem. The Bayesian classifier is used to predict the class by
   234   235   236   237   238   239   240   241   242   243   244