Page 366 - Data Architecture
P. 366
Chapter 9.1: Repetitive Analytics: Some Basics
Outliers
On occasion, there is a point of reference that does not seem to fit with all the other
points. If this is the case, the point of reference can be discarded. Such a point of
reference is referred to as being an “outlier.”
In the case of an outlier, the theory is that some other factors were relevant to the
calculation of the point of reference. Removing the outlier will not hurt the implications
created by the calculation of the least squares regression analysis. Of course, if there are
too many outliers, then the analyst must indulge in a deeper analysis of why the outliers
occurred. But as long as there are only a few outliers and there are reasons why the
outliers should be removed, then removal of outliers is a perfectly legitimate thing to do.
Fig. 9.1.22 depicts a scatter diagram with linear regression analysis and a scatter diagram
with outliers.
Fig. 9.1.22 Least squares regression analysis.
Data Over Time
It is normal to look at data over time. Looking at data over time is a good way to get
366