Page 239 - Big Data Analytics for Intelligent Healthcare Management

P. 239

232 CHAPTER 9 INTELLIGENCE-BASED HEALTH RECOMMENDATION SYSTEM

In the above equation, s(a, u) represents similarity between two users a and u. r a,i and r u,i denote rating
on an item i by user a and u respectively, whereas r a and r u are the mean rating given by user a and u
respectively, while n denotes total number of items in the user-item matrix.
Cosine similarity can be defined as one of the approaches for calculating the similarity between two
nonzero vectors of an inner product space. The degree of similarity depends on the cosine angle be-
tween the vectors. It is mainly applied in information retrieval and text mining. The similarity between
two items u and v is described in Eq. (9.2).
X
! !
u : v r u,i r v,i
! ! i
! ! X X
su , v ¼ ¼ q ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ q ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ (9.2)
u * v 2 2
r r
i u,i i v,i
The Jaccard similarity of sets A and B is jA\Bj/jA[Bj, that is the quotient between the size of the
intersection and the size of the union of A and B. It is described in Eq. (9.3).
ð
JS A, BÞ ¼ A\Bj= A[Bj (9.3)
j
j
In user-based collaborative filtering, we measure user-user similarity by contrasting ratings on the same
product. By calculating a weighted average of ratings of the item by users, it predicts the rating that an
active user gives to the item. In this way, the above three methods are used to measure the similarity
between two items.
B. Model-based collaborative filtering
This approach works on the basis of previous users’ ratings to construct a model that uses the data
mining technique and sometimes applies machine learning. Different approaches such as association
rules, clustering, decision tree, artificial neural network, regression, and Bayesian classifiers, etc. are
used to classify the user and item based on the model.
Association rule mining: These algorithms generate association rules that decide the relationship
among items in a transaction. Association rule A!B means item set A predicts item set B. This rule
can be fitted to a recommendation model to prehrdict information about the user and item.
Clustering: A method used to break the dataset into a number of clusters. Low inter-cluster sim-
ilarity and high intra-cluster similarity are some of the characteristics of a good clustering method.
In the recommendation system, users can participate in different clusters partially and degree of par-
ticipation can be calculated by taking an average across the clusters of participation [18].
Decision tree: Decision tree makes a graph-like tree structure, which is constructed by considering
the training dataset in which class labels are known. This tree can be used to classify test data. The
decision tree is one type of classifier that handles and classifies previous unseen examples [19].
Artificial neural network (ANN): Networks of various interlinked neurons organized in different
layers. The weight and bias factors are associated with each and every neuron of each layer. Each neu-
ron has a transfer function through which it measures input, processes this input and provides an output.
This ANN is a classification technique used to classify test data.
Regression: Regression is an analysis method where two or more variables are related to each other.
One variable is dependent, whereas one or more are independent variables. This regression technique
comprises prediction, curve fitting, and hypothesis testing, which create relationships among variables.
Bayesian classifier: The Bayesian classifier is used to solve classification problems based on con-
ditional probability and Bayes’ theorem. The Bayesian classifier is used to predict the class by

234 235 236 237 238 239 240 241 242 243 244