Page 366 - Data Architecture
P. 366

Chapter 9.1: Repetitive Analytics: Some Basics

           Outliers


           On occasion, there is a point of reference that does not seem to fit with all the other

           points. If this is the case, the point of reference can be discarded. Such a point of
           reference is referred to as being an “outlier.”


           In the case of an outlier, the theory is that some other factors were relevant to the
           calculation of the point of reference. Removing the outlier will not hurt the implications
           created by the calculation of the least squares regression analysis. Of course, if there are
           too many outliers, then the analyst must indulge in a deeper analysis of why the outliers
           occurred. But as long as there are only a few outliers and there are reasons why the
           outliers should be removed, then removal of outliers is a perfectly legitimate thing to do.


           Fig. 9.1.22 depicts a scatter diagram with linear regression analysis and a scatter diagram
           with outliers.





































               Fig. 9.1.22 Least squares regression analysis.



           Data Over Time



           It is normal to look at data over time. Looking at data over time is a good way to get
                                                                                                               366
   361   362   363   364   365   366   367   368   369   370   371