Page 127 - Building Big Data Applications

P. 127

124 Building Big Data Applications

data by using the product and geography information. This dataset can be queried
interactively and can be used for what-if type of causal analysis by different users across
the organization.
Another form of visualization of big data is delivered through the use of statistical
software such as R, SAS, and KXEN, where the predeﬁned models for different statistical
functions can use the data extracted from the discovery environment and integrate the
same with corporate and other datasets to drive the statistical visualizations. Very
popular software that uses R for accomplishing this type of functionality is RStudio.
All the goods that we are discussing in the visualization can be successfully completed
in the enterprise today, with the effective implementation of several algorithms. These
algorithms will be implemented as portions of formulation and transformation of data
across the artiﬁcial intelligence, machine learning, and neural networks. These different
implementations will be deployed for both unsupervised learning and supervised
learning, and we will beneﬁt in visualization from both the techniques. The algorithms
include the following and several proprietary implementations of similar algorithms
within the enterprise.
Recommender
Collocations
Dimensional reduction
Expectation maximization
Bayesian
Locally weighted linear regression
Logistic regression
K-means clustering
Fuzzy K-means
Canopy clustering
Mean shift clustering
Hierarchical clustering
Dirichlet process clustering
Random forests
Support vector machines
Pattern mining
Collaborative ﬁltering
Spectral clustering
Stochastic singular value decomposition

The teams in the enterprise for this visualization and associated algorithms are the
teams of the data scientist.

122 123 124 125 126 127 128 129 130 131 132