Page 56 - Applied Statistics Using SPSS, STATISTICA, MATLAB and R
P. 56
2.1 Preliminaries 35
are interested in defining a new variable, PClass, that categorises the maximum
rain precipitation (variable PMax) into three categories:
1. PMax ≤ 20 (low);
2. 20 < PMax ≤ 80 (moderate);
3. PMax > 80 (high).
Variable PClass can be expressed as
PClass = 1 + (PMax > 20) + (PMax > 80),
whenever logical values associated to relational expressions such as “PMax > 20”
are represented by the arithmetical values 0 and 1, coding False and True,
respectively. That is precisely how SPSS, STATISTICA, MATLAB and R handle
such expressions. The reader can easily check that PClass values are 1, 2 and 3 in
correspondence with the low, moderate and high categories.
In the following subsections we will learn the essentials of data operation with
SPSS, STATISTICA, MATLAB and R.
2.1.2.1 SPSS
The addition of a new variable is made in SPSS by using the Insert
Variable option of the D ata menu. In the case of the previous categorisation
variable, one would then proceed to compute its values by using the Compute
option of the Transform menu. The Compute Variable window shown in
Figure 2.6 will then be displayed, where one would fill in the above formula using
the respective variable identifiers; in this case: 1+(pmax>20)+(pmax>80) .
Looking to Figure 2.6 one may rightly suspect that a large number of functions
are available in SPSS for building arbitrarily complex formulas.
Other data management operations such as sorting and transposing can be
performed using specific options of the SPSS Data menu.
2.1.2.2 STATISTICA
The addition of a new variable in STATISTICA is made with the Add
Variable option of the Insert menu. The variable specification window
shown in Figure 2.7 will then be displayed, where one would fill in, namely, the
number of variables to be added, their names and the formulas used to compute
them. In this case, the formula is:
1+(v1>20)+(v1>80) .
In STATISTICA variables are symbolically denoted by v followed by a number
representing the position of the variable column in the spreadsheet. Since Pmax
happens to be the first column, it is then denoted v1 . The cases column is v0 . It is
also possible to use variable identifiers in formulas instead of -notations.
v