Page 128 - Building Big Data Applications
P. 128
Chapter 6 Visualization, storyboarding and applications 125
The evolving role of the data scientist
There is a new role in the world of data management that has evolved with big data
called a “data scientist”. There are several definitions for this role that are evolving
including data analysts with advanced mathematical degrees, statisticians with multiple
specialist degrees, and much more. In a simple language speak, the “data scientist” is a
role where the person has intimate knowledge of the data being discovered and can
create effectively explore the data and infer relationships that create the foundation for
analytics and visualization.
The key role that enables the difference between success and failure of a big data
program is the data scientist. The term was originally coined by two of the original data
scientists DJ Patil and Jeff Hammerbacher when they were working at LinkedIn and
Facebook.
What defines a data scientist? Is this a special skill or education? How different are
these roles from an analyst or engineer?
There is no standard definition for the role of a data scientist, but here is a close
descriptiondA data scientist is an expert business analyst or an engineer who uses data
discovery tools to find new insights in data by using techniques that are statistical or
scientific in nature. They work on a variety of hypothesis and design multiple models
that they experiment with to arrive at new insights. To accomplish this, they use a large
volume of data, which is collectively called as big data.
Data scientists work very closely with data and often question everything that is input
or output from the data. In fact, in every enterprise there are a handful of senior business
analysts or data analysts that are playing the role of the data scientist without being
formally called as one.
Data scientists use the data discovery tools discussed in this chapter to create the
visualization and analytics associated with big data. This role is still in evolution phases
and in the future we will see many teams of data scientists in enterprises as opposed to a
handful that we see today. If data is the new oil, then the data scientist is the new
explorer.
In summary we can use big data to enhance analytics and deliver data for visuali-
zation and deeper analysis as needed by the enterprise. There are evolving techniques,
methods, and technologies to accomplish this on a regular basis within enterprises.
The underlying goal that is delivered with big data analytics is the capability to drive
innovation and transformation across the enterprise in a transparent and informed
manner, where you can tie the outcomes to the predictions and vice versa. The
possibilities are endless and can be delivered from the same dataset as it is transformed
in discovery processes. This kind of a flexible approach is what you should adapt to when
designing the solutions for big data analytics and visualization.